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REMARKS 

Claims 49, 54, 56-58, 63, 66, 72, 75, 77, and 79-80 are currently pending. Claims 56, 58 
and 75 are amended herein. Claim 77 is canceled. Support for the claim amendments is provided 
by the specification at, for example, page 48, lines 14-20, and page 122, Table I. No new matter is 
added by way of the claim amendments. Applicants respectfully request entry of the claim 
amendments and reconsideration in view of the following. After entry of the claim amendments, 
claims 49, 54, 56-58, 63, 66, 72, 75, and 79-80 will be pending. 

Rejection under 35 U.S.C. § 1 12 % first paragraph - Enablement 

Claims 49, 54-63, 66, 72, 75, 77, 79, and 80 remain rejected under 35 U.S.C. § 1 12, first 
paragraph, as allegedly lacking enablement. Applicants note that claims 55 and 59-62 were 
previously canceled. Applicants respectfully traverse the rejections with respect to the remaining 
claims. 

As a preliminary matter, "[t]o be enabling, the specification of a patent must teach those 
skilled in the art to make and use the full scope of the claimed invention without 'undue 
experimentation 5 . . . Nothing more than objective enablement is required, and therefore it is 
irrelevant whether this teaching is provided through broad terminology or illustrative examples." In 
re Wright, 999 F.2d 1557, 1561 (Fed. Cir. 1993). 'The test is not merely quantitative, since a 
considerable amount of experimentation is permissible, if it is merely routine, or if the specification 
in question provides a reasonable amount of guidance with respect to the direction in which the 
experimentation should proceed." In re Wands, 858 F.2d 731, 737, 8 USPQ2d 1400, 1404 (Fed. Cir. 
1988) (citing In re Angstadt, 537 F.2d 489, 502-04, 190 USPQ 214, 217-19 (CCPA 1976)). 

Claims to polypeptides of SEQ ID NOs:3, 5, and 7, peptides, and methods of detecting 
cancer 

In the first grounds of rejection, claims 49, 54, 56, 57, 75 and 77, drawn to polypeptides of 
SEQ ID NOs:3, 5, and 7, peptides, and methods of detecting cancer are rejected as allegedly lacking 
enablement. Specifically, the Office has alleged that the specification fails to provide a nexus 
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between the expression of 254P1D6B in cancerous tissues and the expression of the individual 
variant sequences claimed, and that undue experimentation would be required to determine the 
expression pattern of the claimed variants before the variant clones could be used as markers for 
cancer. The Office further asserts an even greater degree of undue experimentation would be 
required to determine if the claimed sequences are over-expressed in cancers other than those listed 
on page 122 in order to practice the full scope of the claimed method. Applicants respectfully 
disagree. 

Solely to advance prosecution, and without acquiescing to the Examiner's arguments, claim 
75 is amended to claim a method for detecting the presence of cancer in the tissues listed in Table I. 
See specification at page 122, Table I. Claim 77 is canceled. 

As previously noted, the specification provides Northern blot and PCR data for the so-called 
generic 254P1D6B protein (SEQ ID NO:3), which demonstrate that the gene of interest is 
upregulated in a variety of cancerous tissues relative to normal tissues. See the specification at, e.g., 
page 82, Example 4, and Figures 14 to 16. Applicants respectfully note that the use of Northern 
hybridization to measure gene expression was routine in the art at the time of the application. See, 
e.g., Sambrook & Russell, Molecular Cloning: A Laboratory Manual, 3 rd Ed. (2001), Cold 
Spring Harbor Laboratory Press, at 7.21, 2nd paragraph (noting that "Northern hybridization 
became part of the standard repertoire of molecular biology almost immediately after the first 
descriptions of the method were published.") (attached as Exhibit A). 

Based on the expression data provided for the polypeptide of SEQ ID NO:3, the guidance 
provided by the specification, and the state of the art at the time of the application, applicants 
respectfully submit that the generation of expression data for the claimed polypeptides does not 
constitute "undue" experimentation. 

The Examiner has also asserted that there is no objective evidence that all the variants 
possess the same properties as the generic 254P1D6B sequence (SEQ ID NO: 3), or that the generic 
sequence is over-expressed in malignancies such as sarcomas, melanomas, etc. 
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Applicants respectfully note that genetic variation between the sequences does not 
necessarily imply that each variant has a unique function. Moreover, the biological function of the 
individual protein variants is irrelevant to the use of the claimed proteins as a family of markers for 
the detection of cancer by monitoring expression levels in test tissues. The latter issue, regarding 
over-expression in malignancies such as sarcomas, etc., is obviated by the amendment to claim 75. 

Finally, the Examiner asserts that the specification does not teach how to use the peptides of 
claims 54 and 56 if said peptides do not generate an antibody which binds a polypeptide associated 
with a cancerous state. (See Office action at page 3). 

Applicants respectfully note that claims 54 and 56 do not require that the claimed peptides 
generate an antibody which binds a polypeptide associated with a cancerous state. Rather, claims 
54 and 56 require that the claimed peptides induce a specific antibody response against a 
polypeptide having the amino acid sequence of SEQ ID NO:3, 5, or 7. 

In view of the foregoing, applicants respectfully submit that the claims, as amended, are 
fully enabled. Applicants respectfully request that the enablement rejection as it relates to claims 
49, 54, 56, 57 and 75 be withdrawn. Claim 77 has been canceled, rendering the rejection moot. 

Claims to methods of generating an immune response 

Under the second grounds for rejection, the Office action states that claims 58-62, 79 and 80 
are drawn to a method of generating an immune response in a mammal comprising exposing cells of 
said mammal to a polynucleotide of SEQ ID NO:3, 5, or 7. (See Office action at page 4). 

Applicants respectfully note that claims 59-62 were previously canceled. Claims 79 and 80 
are drawn to polynucleotides encoding the polypeptides of SEQ ID NOs:3, 5, and 7, wherein the 
polypeptide is encoded as a portion of a viral vector. Clarification of the grounds of rejection with 
respect to claims 79 and 80 is requested. Applicants will address the third grounds of rejection as it 
relates to claim 58. 
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The Examiner asserts that the claims encompass the generation of cytotoxic T cells which 
kill the autologous cells which express said proteins. The Examiner also asserts that it is well known 
in the art that an antibody must bind to a cell surface target, and cites Abbas et al. to support the 
contention that targets which evoke complement or ADCC-mediate cell killing must be on the cell 
surface. The Examiner further asserts that there is no evidence in the specification that the 
254P1D6B protein is a cell surface protein, and that undue experimentation, without a reasonable 
expectation of success, would be required for one of skill in the art to use the claimed methods. 
Applicants respectfully disagree. 

Without acquiescing the Examiner's argument, claim 58 is amended to clarify that the claim 
relates to a method of generating an immune response to a polypeptide having SEQ ID NO: 3, 5, 
or 7, wherein said immune response is the activation of B cells. 

For reasons of record, applicants respectfully submit that the Examiner's argument 
mischaracterizes Abbas et al. Moreover, the specification provides sufficient evidence such that a 
person of skill in the art would reasonably conclude that the 254P1D6B protein is present on the cell 
surface. Secondary structure and transmembrane (TM) domain predictions for the protein of 
254P1D6B v.l (SEQ ID NO:3) are provided in Figure 13. For example, Figure 13C provides a 
schematic representation of the probability of the existence of a transmembrane region in the 
polypeptide of SEQ ID NO:3, based on the TMHMM algorithm of Sonnhammer et aL, indicating 
that the polypeptide of SEQ ID NO:3 contains a single TM domain. See specification at, for 
example, page 7 and Figure 13C. 

At the time of the application, the use of computational methods to predict protein secondary 
structure and the presence of transmembrane domains was well-known in the art. See Chen et al., 
Protein Sci. 2002; 1 1 :2774-2791 (attached as Exhibit B). Chen et al. reported that the TMHMM 
algorithm correctly identified ca. 90% of all observed membrane helices. See Chen et aL, at page 
2777, left column and Table 1, and page 2778, Table 2 (Q h tm %obs , % of all observed helices that are 
predicted correctly). Based on the data provided in the specification for the polypeptide of SEQ ID 
NO:3, and the relatively high degree of predictive accuracy for the computational methods 
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described, a person of skill in the art would have a reasonable expectation of success that the 
claimed polypeptides would be detectable using an antibody exposed to such a cell. 

The Examiner has cited copious references in support of the assertion that the state of the art 
with respect to treating patients with cancer by means of administering tumor antigens is 
unpredictable. Applicants respectfully note that the pending claims do not relate to methods of 
treating cancer patients by administering tumor antigens. The Examiner has further asserted that the 
specification does not provide any disclosure that the administration of the claimed polypeptides 
would generate CTLs. As amended, claim 58 recites a method of generating an immune response 
wherein said immune response is the activation of B cells. Accordingly, this basis of rejection may 
be properly withdrawn. 

Claims to polynucleotides and host cells 

Claims 63, 66, and 72, drawn to polynucleotides that encode polypeptides of SEQ ID N0s:3, 
5, and 7, polynucleotides having SEQ ID NOs:2, 4, and 6, and host cells, are rejected for the reasons 
stated by the Office with respect to the polypeptide claims. Specifically, the Office alleges that no 
specific data has been presented for the individual polynucleotides of SEQ ID NOs:2, 4, and 6 to 
provide a nexus between the claimed polynucleotides and the detection of cancer. Applicants 
respectfully disagree, for the reasons stated above with respect to the claimed polypeptides. 

As noted above, the specification provides actual data using sequences falling within the 
scope of the rejected claims to detect the over-expression of the gene of interest in cancer samples 
versus normal samples. See the specification at, e.g., page 82, Example 4, and Figures 14 to 16. The 
data provided by the applicants clearly demonstrates how one of ordinary skill in the art could use 
the claimed polynucleotides to detect cancer in a test tissue sample. 

In view of the data and guidance provided by the specification, as well as the routine nature 
of such experiments, applicants respectfully submit that the experimentation required to generate 
data for the individual claimed polynucleotide sequences is not "undue". Accordingly, Applicants 
request that the enablement rejection as it relates to claims 63, 66, and 72 be withdrawn. 
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Rejection under 35 U.S.C. § 1 12, first paragraph - Written Description 

Claims 54 and 56 stand rejected under 35 U.S.C. § 1 12, first paragraph as allegedly failing 
to comply with the written description requirement. Applicants note that claim 55 has been 
previously canceled. Applicants respectfully traverse the rejection. 

To satisfy the written description requirement, a patent application must describe the 
invention in sufficient detail that one of skill in the relevant art could reasonably conclude that the 
inventor was in possession of the claimed invention at the time the application was filed. See Vas- 
Cath Inc. v. Mahurkar, 935 F.2d 1555, 1563-64, (Fed. Cir. 1991). An applicant need not describe 
exactly the subject matter claimed in the specification in order to satisfy the written description 
requirement. See Union Oil ofCal v. Atlantic Richfield Co., 208 F.3d 989, 997 (Fed. Cir. 2000). 
"What is conventional or well known to one of ordinary skill in the art need not be disclosed in 
detail." See Hybritech Inc. v. Monoclonal Antibodies, Inc., 802 F.2d at 1384, 231 USPQ at 94. 

In the applicants' response submitted on January 30, 2007, the applicants noted that the 
specification described polypeptide sequences which could be used to produce the claimed peptides. 
See specification at, for example, Figure 2. The applicants also described how to use the claimed 
peptides to induce a specific antibody response, as well as how to test for such a response. See 
specification at, for example, pages 90-92, Examples 10 and 1 1 . Applicants have also identified 
specific amino acid regions of 254P1D6B variant 1 (SEQ ID NO:3) which may be used to induce an 
antibody response. See specification at, for example, page 90, Example 10, para. 2. 

Notwithstanding the foregoing, the Examiner has stated that because the polypeptides of 
SEQ ID NOs:3, 5, and 7, and the claimed peptide fragments were not themselves conventionally 
known in the art, that a connection between the epitopes and the induction of a specific antibody 
response cannot be inferred. (See Office action at page 9). Applicants respectfully disagree. 

The induction of specific antibody responses to peptides was conventionally known in the 
art at the time the present application was filed. See specification at, for example, page 63, line 36 
to page 64, line 4; see also Ausubel, et al., Eds., Current Protocols In Molecular Biology (2002), 

9 

sd-368730 



Application No.: 10/764,390 



Docket No.: 511582008100 



John Wiley & Sons, Inc., at VOL. 2, 1 1 . 1 6. 1 - 1 1 . 1 6.5, and 1 1 . 1 6. 1 6- 1 1 . 1 6. 1 9 (attached as Exhibit C). 
What has not been literally described by the applicants was thus conventionally known in the art at 
the time the present application was filed. 

In view of the applicant's disclosure, as well as what was known in the art at the time of the 
application, a person of ordinary skill in the art could readily conclude that applicants were in 
possession of the claimed invention when the application was filed. Accordingly, applicants 
respectfully request that the written description rejection under 35 U.S.C. § 1 12, first paragraph, be 
withdrawn. 
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CONCLUSION 



In view of the above, each of the presently pending claims in this application is believed to 
be in immediate condition for allowance. Accordingly, the Examiner is respectfully requested to 
withdraw the outstanding rejection of the claims and to pass this application to issue. If it is 
determined that a telephone conference would expedite the prosecution of this application, the 
Examiner is invited to telephone the undersigned at the number given below. 

In the event the U.S. Patent and Trademark office determines that an extension and/or other 
relief is required, applicants petition for any required relief including extensions of time and 
authorize the Commissioner to charge the cost of such petitions and/or other fees due in connection 
with the filing of this document to Deposit Account No. 03-1952 referencing docket 
no. 511582008100. However, the Commissioner is not authorized to charge the cost of the issue fee 
to the Deposit Account. 

Dated: June 11, 2007 Respectfully submitted, 




Registration No.: 54,708 
MORRISON & FOERSTER LLP 
12531 High Bluff Drive 
Suite 100 

San Diego, California 92130-2040 
(858) 314-7717 
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ORTHERN HYBRIDIZATION IS USED TO MEASURE the amount and size of RNAs transcribed from 
eukaryotic genes and to estimate their abundance. No other method is capable of obtaining these 
pieces of information simultaneously from a large number of RNA preparations; northern analy- 
sis is therefore fundamental to studies of gene expression in eukaryotic cells. 

Northern hybridization became part of the standard repertoire of molecular biology almost 
immediately after the first descriptions of the method were published (Alwine et al. 1977, 1979). 
Although many variations and improvements (e.g., please see Kroczek 1993) have been published 
during the succeeding 20 years, the basic steps in northern analysis remain unchanged: 

• isolation of intact mRNA 

• separation of RNA according to size through a denaturing agarose gel 

• transfer of the RNA to a solid support in a way that preserves its topological distribution with- 
in the gel 

• fixation of the RNA to the solid matrix 

• hybridization of the immobilized RNA to probes complementary to the sequences of interest 

• removal of probe molecules that are nonspecifically bound to the solid matrix 

• detection, capture, and analysis of an image of the specifically bound probe molecules. 

There are choices at every step during the process and new alternatives continually appear 
in the literature. It is impossible to distill from this ferment the "best" combination of methods 
that can be universally applied in all situations. However, the methods described in the next five 
protocols are extremely robust and have worked well in a wide variety of circumstances. 

SEPARATION OF RNA ACCORDING TO SIZE 

Electrophoresis through denaturing agarose gels is used to separate RNAs according to size and is 
the first stage in northern hybridization. In earlier times, methylmercuric hydroxide (Bailey and 
Davidson 1976) achieved some degree of popularity, particularly among the brave and foolhardy. 
Although unparalleled as a denaturing agent, methylmercuric hydroxide is both volatile and 
extremely toxic (Cummins and Nesbitt 1978) and is therefore no longer recommended. The fol- 
lowing are the two methods most commonly used today to separate denatured RNAs for north- 
ern analysis. 
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• Electrophoresis of RNA denatured with glyoxal/formamide through agarose gels (Protocol 5) 
(Bantle et al. 1976; McMaster and Carmichael 1977; Goldberg 1980; Thomas 1980, 1983). 

• Pretreatment of RNA with formaldehyde and dimethylsulfoxide, followed by electrophoresis 
through gels containing up to 2.2 M formaldehyde (Protocol 6) (Boedtker 1971; Lehrach et al. 
1977; Raveet al. 1979). 

The two systems have approximately the same resolving power (Miller 1987), and the tech- 
nical problems with both of them have long since been solved. For example, recirculation of elec- 
trophoresis buffer is no longer required when separating glyoxylated RNA in agarose gels and 
staining of RNA with ethidium bromide is now possible. However, glyoxal and especially 
formaldehyde retain some disadvantages, including toxicity. The choice between the systems 
therefore depends on the relative weight of these disadvantages, which will vary from one labo- 
ratory to the next. 

Many compounds other than glyoxal, formaldehyde, and methylmercuric hydroxide have 
been explored as denaturing agents for RNA during gel electrophoresis, but few of these have 
proven to be reliable in routine laboratory use. Guanidine thiocyanate is the only compound that 
may have advantages over formaldehyde or glyoxal (Goda and Minton 1995). When incorporat- 
ed into an agarose gel at a final concentration of 10 mM, it maintains RNA in a denatured form. 
Electrophoresis may be carried out in standard TBE buffer and ethidium bromide may be incor- 
porated in the gel. However, few laboratories have adopted the method, and at present, experience 
with this system is too limited for us to recommend that guanidine thiocyanate be used in place 
of glyoxal and fomaldehyde, 

EQUALIZING AMOUNTS OF RNA IN NORTHERN GELS 

Equalizing the amounts of RNA loaded into lanes of northern gels is a thorny problem when a 
number of different samples are to be compared. Several different approaches are possible and 
none of them perfect: 

• Loading of equal amounts of RNA (usually 0.5-0.7 OD 260 units) into each lane of the gel. 
rRNAs are the dominant components in preparations of total cellular RNA and contribute 
>75% of the UV-adsorbing material. Northern analysis of equal quantities of total RNA shows 
how the steady-state concentration of target mRNAs changes with respect to rRNA content of 
the cell (Alwine et al. 1977; de Leeuw et al. 1989). Unlike the transcripts of housekeeping genes 
(see below), there is no evidence that the levels of 18S or 28S rRNA vary significandy from one 
mammalian tissue or cell line to the next (e.g., please see Bhatia et al. 1994). In addition, rRNA 
can easily be detected in agarose gels by staining with ethidium bromide instead of a second 
round of hybridization with a specific probe. 

• Normalizing samples according to their content of mRNAs of an endogenous, constitutively 
expressed housekeeping gene such as cyclophilin, (i-actin, or glyceraldehyde-3-phosphate 
dehydrogenase (GAPDH) (Kelly et al. 1983). All three genes are expressed at moderately abun- 
dant levels (-0.1% of poly(A) + RNA or 0.003% of total cellular RNA). Variations observed in 
the intensity of the hybridization signal of the gene of interest are often expressed relative to 
one of these three housekeeping genes. However, it turns out that the levels of expression of 
housekeeping genes are not constant from one mammalian tissue to another nor from one cell 
line to another (e.g., please see Spanakis 1993; Bhatia et al. 1994). Alterations in the relative 
intensity of the hybridization signals between the housekeeping gene and the gene of interest 
may therefore result from changes in the level of transcription of either gene or both. 
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• Loading of equal amounts ofpoly(A) + RNA. The poly( A) + content of preparations of RNA can 
be compared by slot- or dot-blot hybridization to a radiolabeled poly(dT) probe (Harley 1987, 
1988). Equivalent amounts of poly(A) + RNA can then be loaded into each lane of a northern 
gel. This is an attractive option because it measures changes in concentration of a specific 
mRNA relative to the total amount of gene transcripts in the cell. 

• Using a synthetic pseudomessage as a standard. Several groups (e.g., please see Toscani et al. 
1987; DuBois et al. 1993) have used RNAs synthesized in vitro as externally added standards to 
calibrate the expression of the gene of interest in different preparations of cellular RNA. The 
synthetic pseudomessage, which is engineered to be different in size from the natural message, 
is added in known amounts to samples at the time of cell lysis. The relative intensity of the 
hybridization signals obtained from the authentic and pseudomessages is used to estimate the 
expression of the endogenous gene of interest. 

MARKERS USED IN GELS TO FRACTIONATE RNA 

The size of an RNA of interest can be measured accurately only when markers of known molec- 
ular weight are included in the gel. Four types of markers are commonly used: 

• RNA standards purchased from a commercial source. These standards are usually generated 
by in vitro transcription of cloned DNA templates of known length. As a consequence, the 
RNA standards are sometimes contaminated by template DNA and its associated plasmid 
sequences. Vector sequences present in the probe used in northern hybridization may 
hybridize to these remnants, generating on the autoradiogram either discrete bands or, more 
commonly, a smear where none should be. 

• DNA standards purchased from a commercial source. Glyoxylated denatured DNAs and 
RNAs of equal length migrate at equal speeds through agarose gels. Small DNAs of known size 
can therefore be used as markers in this system. Once again, however, there is a chance that vec- 
tor sequences present in the probe may hybridize with the standards. At times, this can be an 
advantage because the signals generated by the marker bands on the autoradiogram can be 
used directly to measure the size of the RNA of interest. DNA standards should not be used as 
markers on gels containing formaldehyde since RNA migrates through these gels at a faster rate 
than DNA of equivalent size (Wicks 1986). 

• Highly abundant rRNAs (28S and 1SS) within the RNA preparations under test. The sizes of 
these RNAs vary slightly from one mammalian species to another. 18S rRNAs range in size 
from 1.8 kb to 2.0 kb, whereas 28S RNAs range between 4.6 kb and 5.3 kb in length. 

- • Tracking dyes. In most denaturing agarose gel systems, bromophenol blue migrates slightly 
faster than the 5S rRNA, whereas xylene cyanol migrates slightly slower than the 18S rRNA. 

MEMBRANES USED FOR NORTHERN HYBRIDIZATION 

Transfer of electrophoretically separated DNA and RNA from gels to two-dimensional solid sup- 
ports is a key step in northern hybridization. Initially, hybridization was carried out exclusively 
with RNA immobilized on activated cellulose papers (Alwine et al. 1977; Seed 1982a,b). However, 
it was soon realized that RNA denatured by glyoxal, formaldehyde, or methylmercuric hydroxide, 
like denatured DNA, binds tightly to nitrocellulose (Thomas 1980, 1983). For several years there- 
after, nitrocellulose was the support of choice for northern hybridization. 
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Unfortunately, nitrocellulose is not an ideal matrix for solid-phase hybridization because its 
capacity to bind nucleic acids is low (-50-100 ng/cm 2 ) and varies according to the size of the 
RNA. In addition, the RNA becomes attached to nitrocellulose by hydrophobic rather than cova- 
lent interactions and therefore leaches slowly from the solid support during hybridization and 
washing at high temperatures. Finally, nitrocellulose membranes become brittle during baking 
under vacuum at 80°C, which is an integral part of the process to immobilize nucleic acids. The 
friable membranes cannot subsequently survive more than two to three cycles of hybridization 
and washing at high temperatures. 

These problems have been solved by the introduction of various types of nylon membranes 
that bind nucleic acids irreversibly, are far more durable than nitrocellulose filters (Reed and 
Mann 1985), and can be repaired if damaged (Pitas 1989). Immobilized nucleic acids can there- 
fore be hybridized sequentially to several different probes. Furthermore, because nucleic acids can 
be immobilized on nylon in buffers of low ionic strength, transfer of nucleic acids from gels to 
nylon can be carried out electrophoretically. This advantage can be useful when capillary or vac- 
uum transfer is inefficient, for example, when small molecules of RNA are transferred from poly- 
acrylamide gels. 

Two types of nylon membranes are available commercially: unmodified (or neutral) nylon 
and charge- modified nylon, which carries amine groups and is therefore also known as positive- 
ly charged or (+) nylon. Both types of nylon bind single- and double-stranded nucleic acids and 
retention is quantitative in solvents as diverse as water, 0.25 N HC1, and 0.4 N NaOH. Charge- 
modified nylon has a greater capacity to bind nucleic acids (see Table 7-3), but it has a tendency 
to give increased levels of background hybridization, which results, at least in part, from nonspe- 
cific binding of negatively charged phosphate groups in RNA to the positively charged groups on 
the surface of the polymer. However, this problem can usually be controlled by using increased 
quantities of blocking agents in the prehybridization and hybridization steps. 



Nylon is a generic name for any long-chain synthetic polymer having recurring polyamide (-CONH-) 
groups. Nylons of different types are formed from various combinations of diacids, diamines, and amino acids. 
In the standard nomenclature, a single numeral (e.g., nylon 6) indicates the number of carbon atoms in a 
monomer. Two numbers (e.g., nylon 6,6 or 66) indicate a polymer formed from diamines and dibasic acids. 
The first number indicates the number of carbon atoms separating the nitrogen atoms of the diamine, and the 
second number indicates the number of straight chain carbon atoms in the dibasic amino acid. 

Fiber 66, the original name of nylon, was developed in the 1 930s by Wallace Carothers, a chemist work- 
ing for DuPont (see Fenichell 1 999). His discovery, which grew from a decade of research on the structure and 
assembly of long-chain polyamide polymers, should have been the capstone of his career, but instead was the 
catalyst to tragedy. Carothers, more a scientific aesthete than a twentieth century company man, became 
deeply depressed by the idea that he had discovered a material whose chief use seemed to be as a replace- 
ment for silk stockings. In 1 937, a few days after filing his patent for Fiber 66, Carothers, just 41 years old, killed 
himself in a hotel room by swallowing cyanide. DuPont pressed ahead with the commercial development of 
Fiber 66 and, in a ceremony that would have been anathema to Carothers, dedicated the name nylon to the 
public domain at a Herald Tribune Forum in October of the following year. Stockings, of course, turned out to 
be just the first of a line of nylon products, some of which would surely have given Carothers great pleasure, 
including perhaps, nylon membranes for immobilizing nucleic acids. 

Different brands of nylon membranes are available that vary in the extent and type of 
charged groups and the density of the nylon mesh. Comparisons of the efficiency of these mem- 
branes for northern blotting and hybridization under various conditions are published from time 
to time (e.g., please see Khandjian 1987; Rosen et al. 1990; Twomey and Krawetz 1990; Beckers et 
al. 1994). In addition, each manufacturer provides specific recommendations for the transfer of 
nucleic acids to their particular product. The instructions given in Protocols 6 through 8 (north- 
ern hybridization) and in Chapter 6, Protocols 8-10 (Southern transfer) work well in almost all 
circumstances, and in some cases, yield results that exceed the manufacturer's standard. 
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TABLE 7-3 Properties of Nylon Me mbranes Used for Immobilization of DNA and RNA 

Property Neutral Nylon Charged Nylon 



Capacity (ug nucleic acid/cm 2 ) 

Size of nucleic acid required for 
maximal binding 

Transfer buffer 

Immobilization 



Commercial products 



-200-300 
>50 bp 



400-500 
>50 bp 



low ionic strength over a wide range of pH 

baking for 1 hour at 70°C; no vacuum required 
or 

mild alkali 
or 

UV irradiation at 254 nm; damp membranes are generally 
exposed to 1.6 kj/m 2 ; dried membranes require 160 kj/m 2 

Hybond-N Hybond-N+ 
Gene-Screen Zeta-Probe 

Nytran + 
Gene-Screen Plus 



TRANSFERRING RNA FROM GELS TO SOLID SUPPORTS 

The crucial step in northern analysis is the transfer of denatured RNA from the interstices of an 
agarose gel to the surface of a membrane. Transfer must be done in a way that not only preserves 
the distribution of the molecules along the length of the gel, but works efficiently for nucleic acids 
of quite different sizes. Over the years, many methods have been found to achieve these goals, 
including electroblotting, vacuum blotting, semidry blotting, and upward capillary blotting. In 
addition, several attempts have been made to avoid transfer completely by performing hybridiza- 
tion directly in the gel (e.g., please see Purrello and Balazs 1983; Tsao et al. 1983). However, it is 
not clear whether these techniques, which may require expensive pieces of equipment, are supe- 
rior to the original method of upward capillary transfer (Southern 1975). Certainly, there does 
not seem to be any good reason to rush out and buy a vacuum blotting or electroblotting appa- 
ratus in the belief that it will significantly improve northern and Southern blots. 

• Upward capillary transfer. The original simple and economical technique devised by 
Southern (1975) involves an overnight transfer of nucleic acids from gel to membrane in an 
upward flow of buffer (please see Figure 7-2). A major drawback is selective retention of large 
molecules of nucleic acid within the gel, which is caused by flattening, compression, and dehy- 
dration of the gel. This problem can be relieved ( 1 ) by using the thinnest gels possible, (2) by 
ensuring that the filter papers in immediate contact with the gel are thoroughly saturated with 
buffer before transfer begins, and (3) by partial hydrolysis of RNA by alkali (Reed and Mann 
1985) before transfer. It is important that partial hydrolysis be used with moderation since 
overenthusiasm can generate fragments too short to bind efficiently to the membrane. 

Since 1975, the common practice has been to carry out upward capillary transfer for 16 
hours or so. However, ascending transfer is now known to be almost complete after 4 hours 
(Lichtenstein et al. 1990), and we now recommend much shorter transfer times. A more seri- 
ous problem with ascending transfer is the potential for some of the RNA to move from the 
gel in a descending direction counter to the flow of the buffer. This apparent anomaly occurs 
when the filter paper under the gel is not fully saturated with buffer. Fluid is then drawn from 
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the gel, carrying with it some of the nucleic acid. The problem can be ameliorated by ensuring 
that the bottom filter paper, like the top, is fully saturated with buffer and by working quickly 
to set up the remainder of the transfer system once the gel has been laid on the bottom filter. 

• Downward capillary transfer. Descending transfer (please see Figure 7-3) does not cause flat- 
tening of the agarose gel and results in a faster transfer of nucleic acid. RNA molecules up to 8 
kb in size, for example, are transferred with high efficiency within 1 hour at either neutral or 
alkaline pH (Chomczynski 1992; Chomczynski and Mackey 1994). The speed of downward 
capillary transfer therefore has particular advantage when carrying out alkaline blotting of 
RNA. Blotting of RNA for more than 4 hours significantly decreases the strength of the 
hybridization signal, presumably due to excessive hydrolysis of the RNA. 



FURTHER INFORMATION ABOUT NORTHERN HYBRIDIZATION 

Northern and Southern hybridizations have much in common, including, for example, the 
mechanics of hybridization, the types of probes, and the posthybridization processing of the 
membranes. All of these topics are discussed in depth in other areas within this manual. Signposts 
to this information are posted at relevant positions within the next five protocols. 
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Abstract 

Methods that predict membrane helices have become increasingly useful in the context of analyzing entire 
proteomes, as well as in everyday sequence analysis. Here, we analyzed 27 advanced and simple methods 
in detail. To resolve contradictions in previous works and to reevaluate transmembrane helix prediction 
algorithms, we introduced an analysis that distinguished between performance on redundancy-reduced highl- 
and low-resolution data sets, established thresholds for significant differences in performance, and imple- 
mented both per-segment and per-residue analysis of membrane helix predictions. Although some of the 
advanced methods performed better than others, we showed in a thorough bootstrapping experiment based 
on various measures of accuracy that no method performed consistently best. In contrast, most simple 
hydrophobicity scale-based methods were significantly less accurate than any advanced method as they 
overpredicted membrane helices and confused membrane helices with hydrophobic regions outside of 
membranes. In contrast, the advanced methods usually distinguished correctly between membrane-helical 
and other proteins. Nonetheless, few methods reliably distinguished between signal peptides and membrane 
helices. We could not verify a significant difference in performance between eukaryotic and prokaryotic 
proteins. Surprisingly, we found that proteins with more than five helices were predicted at a significantly 
lower accuracy than proteins with five or fewer. The important implication is that structurally unsolved 
multispanning membrane proteins, which are often important drug targets, will remain problematic for 
transmembrane helix prediction algorithms. Overall, by establishing a standardized methodology for trans- 
membrane helix prediction evaluation, we have resolved differences among previous works and presented 
novel trends that may impact the analysis of entire proteomes. 

Keywords: Sequence analysis; protein structure prediction; multiple alignments, predicting transmembrane 
helices; comparing genomes; bioinformatics; computational biology; proteomes 

Supplemental material: See www.proteinscience.org. 



Reprint requests to: Burkhard Rost. Department of Biochemistry and 
Molecular Biophysics, Columbia University. 650 W. 168 St., BB217, New 
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Abbreviations; A-Cid, normalized hydrophobicity scale for a-proteins 
(Cid 1992); Av-Cid, normalized average hydrophobicity scale (Cid 1992); 
Ben-Tal, hydrophobicity scale representing the free energy of transferring 
an amino acid from water into the center of the hydrocarbon region of a 
lipid bilayer (Kessel and Ben-Tal 2002); BIG, nonidentical merger of 
SWISS-PROT (Bairoch and Apweiler 2000) and TrEMBL (Bairoch and 
Apweiler 2000) and PDB (Berman et al. 2000); BLAST, fast sequence 
alignment method (Altschul and Gish 1996); Bull-Breese, Bull-Breese 
hydrophobicity scale (Bull 1974); DSSP, program assigning secondary 
structure (Kabsch and Sander 1983); Eisenberg, normalized consensus hy- 
drophobicity scale (Eisenberg et al. 1984); EM, Solvation free energy 



(Eisenberg and McLachlan 1986); EVA, server automatically evaluating 
structure prediction methods (Eyrich et al. 2001a,b); Fauchere, hydropho- 
bic parameter tt from the partitioning of /V-acetyl-ami no-acid amides (Fau- 
chere and Pliska 1983); GES, hydrophobicity property (Engelman et al. 
1986; Prabhakaran 1990); Heijne, transfer free energy to lipophilic phase 
(von Heijne and Blomberg 1979); HMM, hidden Markov model; 
HMMTOP, hidden Markov model predicting transmembrane helices 
(Tusnady and Simon 1998); Hopp- Woods, Hopp- Woods hydrophilicity value 
(Hopp and Woods 1981); KD, Kyte-Doolittle hydropathy index (Kyte and 
Doolittle 1982); Lawson, transfer free energy (Lawson et al. 1984); Levitt, 
hydrophobic parameter (Levitt 1976); MaxHom, dynamic programming 
algorithm for conservation weight-based multiple sequence alignment 
(Sander and Schneider 1991); MEMS AT, dynamic-programming based 
prediction of transmembrane helices (Jones et al. 1994); META-PP, inter- 
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Helical membrane proteins challenge bioinformatics. Mem- 
brane proteins are crucial for survival. They constitute key 
components for cell-cell signaling, mediate the transport of 
ions and solutes across the membrane, and are crucial for 
recognition of self (Stack et al. 1995; Chapman et al. 1998; 
Le Borgne and Hoflack 1998; Chen and Schnell 1999; Het- 
tema et al. 1999; Pahl 1999; Truscott and Pfanner 1999; 
Bauer et al. 2000; Ito 2000; Soltys and Gupta 2000; Tha- 
nassi and Hutltgren 2000). Furthermore, the pharmaceutical 
industry preferably targets membrane-bound receptors 
(Heusser and Jardieu 1997; Bettler et al. 1998; Moreau and 
Huber 1999; Saragovi and Gehring 2000; Sedlacek 2000). 
Despite their great biological and medical importance, we 
still have very little experimental information about their 3D 
structures: <1% of the proteins of known structure are mem- 
brane proteins. Fortunately, it is relatively easy to identify 
the location of membrane helices through low-resolution 
experiments. An expert-curated list of low-resolution ex- 
periments maintained by Steffen Moller and colleagues 
(Moller et al. 2000) considers information from C-terminal 
fusions with indicator proteins (McGovern et al. 1991; Hen- 
nessey and Broome-Smith 1993; Traxler et al. 1993; van 
Geest and Lolkema 2000) and from antibody-binding stud- 
ies (Traxler et al. 1993; McGuigan 1994; Jermutus et al. 
1998; Morris et al. 1998; Amstutz et al. 2001). Neverthe- 
less, we only have low-resolution experimental information 
for <500 helical membrane proteins, and PDB (Berman et 



net service allowing access to a variety of bioinformatics tools through one 
single interface (Eyrich and Rost 2000); Nakashima, normalized compo- 
sition of membrane proteins (Nakashima et al. 1990); PDB, Protein Data 
Bank of experimentally determined 3D structures of proteins (Bernstein et 
al. 1977; Berman et al. 2000); PHDhtm, profile-based neural network 
prediction of transmembrane helices (Rost 1996; Rost et al. 1996b); 
PHDpsihtm, divergent profile (PSI-BLAST)-based neural network predic- 
tion 2002); PSI-BLAST, position-specific iterated database search 
(Altschul et al. 1997); Radzicka, transfer free energy from l-octanol to 
water (Radzicka and Wolfenden 1988); Roseman, solvation-corrected side- 
chain hydropathy (Roseman 1988); SignalP, signal peptide prediction 
(Nielsen et al. 1997a); SOSUI, hydrophobicity- and amphiphilicity-based 
transmembrane helix prediction (Hirokawa et al. 1998); SPLIT, transmem- 
brane helix prediction (Juretic et al. 1998); Sweet, optimal matching hy- 
drophobicity (Sweet and Eisenberg 1983); SWISS-PROT, database of pro- 
tein sequences (Bairoch and Apweiler 2000); TM, transmembrane; TMAP, 
alignment-based prediction of transmembrane helices (Persson and Argos 
1996); TMH, transmembrane helix; TMHMM, transmembrane prediction 
using cyclic hidden Markov models (Sonnhammer et al. 1998; Krogh et al. 
2001); TMpred, prediction of transmembrane helices (Hofmann and Stoffel 
1993); TopPred2, hydrophobicily-based membrane helix prediction (von 
Heijne 1992; Cserzo* et al. 1997); TrEMBL, translation of the EMBL- 
nucleotide database coding DNA to protein sequences (Bairoch and Ap- 
weiler 2000); Wolfenden, hydration potential (Wolfenden et al. 1981); 
WW, Wimley-White hydrophobicity scale-based method (Wimley et al. 
I996a,b; White and Wimley 1999; White 2001). 

Terminology: Advanced prediction methods, all methods that do not 
exclusively use a hydrophobicity scale; simple prediction methods, mem- 
brane prediction methods exclusively based on hydrophobicity scales. 

Formula abbreviations: htm, transmembrane helix; T, residue in trans- 
membrane helix; N, nonmembrane residue. 

Article and publication are at http://www.proteinscience.org/cgi/doi/ 
10. 1110/ps.02 14502. 



al. 2000) contains <50 sequence-unique protein chains with 
high-resolution helical membrane structures (Materials and 
Methods). These numbers contrast with the >7000 helical 
membrane proteins expected in humans alone (Wallin and 
von Heijne 1998; Krogh et al. 2001; Liu and Rost 2001). 
Thus, bioinformatics is challenged to help bridge the infor- 
mation gap between what we want and what we have. 

Published estimates for membrane helix prediction ques- 
tioned by recent analyses. Recently, a few groups have 
questioned the estimated levels of performance for mem- 
brane helix prediction methods. Moller, Croning, and Ap- 
weiler analyzed 14 prediction methods that did not use 
alignment information on a set of 188 proteins with experi- 
mentally known helices (Moller et al. 2000, 2001). They 
also applied the prediction methods to globular proteins and 
to signal peptides. The results indicated the following con- 
clusions: (1) The best prediction method (TMHMM, trans- 
membrane prediction using cyclic hidden Markov models) 
correctly predicts all membrane helices for 52%-69% of all 
proteins tested. (2) The best distinction between globular 
and membrane-helical proteins reaches levels of >97% for 
the globular proteins tested (TMHMM and SOSUI, hydro- 
phobicity- and amphiphilicity-based transmembrane helix 
prediction). (3) On a set of 34 signal and transit peptide 
proteins, the best methods reached 98% (PHDhtm, profile- 
based neural network prediction of transmembrane helices) 
to 100% (ALOM2) accuracy in distinguishing these from 
membrane helices. (4) The best simple hydrophobicity in- 
dex (KD, Kyte-Doolittle hydropathy index; Kyte and 
Doolittle 1982) correctly predicted all helices for 44% of all 
the proteins in a set for which HMMTOP (hidden Markov 
model predicting transmembrane helices; Tusnady and Si- 
mon 1998) reached only 43% accuracy. Another recent 
analysis was based on a set of 145 sequence-unique proteins 
(Ikeda et al. 2001). The researchers tested 10 prediction 
methods not using alignment information on their data set. 
In contrast to Moller et al., the investigators found that 
HMMTOP was not only much better than the KD hydro- 
phobicity index, but that it was the most accurate prediction 
method, correctly predicting all membrane helices for -68% 
of all proteins. Averaging over all 10 methods, the authors 
found the resulting consensus prediction -10 percentage 
points more accurate than the best single method. The in- 
vestigators also claimed that prediction accuracy is higher 
for prokaryotes than for eukaryotes. They speculated that 
they found different levels of accuracy than Moller et al. 
because they used different percentages of prokaryotic pro- 
teins in their data sets. Jayasinghe, Hristova, and White 
analyzed four prediction methods on two different sets of 
proteins with known membrane helix locations: (1) on 150 
high-resolution structures from PDB, and (2) on 242 low- 
resolution proteins (Jayasinghe et al. 2001b). The research- 
ers found that the results between the high- and low-reso- 
lution sets differed marginally and reported that the best 
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methods (PHDhtm and HMMTOP) correctly predict >93%- 
97% of all helices. This group has also proposed a method 
based on a novel entropy-based hydrophobicity scale, 
namely, the Wimley-White scale (WW, Wimley-White hy- 
drophobicity-scale-based method), which is claimed to cor- 
rectly predict 99% of all membrane helices (Jayasinghe et 
al. 2001a). One major problem of hydrophobicity-based 
methods appears to be the poor distinction between mem- 
brane and globular proteins (Edelman 1993; Jones et al. 
1994; Rost et al. 1995, 1996b; Jayasinghe et al. 2001a; 
Moller et al. 2001). 

Problems with previous analyses. Previous analyses were 
limited in various ways. (1) Performance on high- and low- 
resolution data sets was distinguished by neither the Moller 
nor the Ikeda groups, although it seemed that performance 
differed between the two (Jayasinghe et al. 2001b). (2) The 
redundancy in data sets resulting from many copies of very 
similar proteins was not reduced by the Mftller or Jayasin- 
ghe groups. However, such bias is known to create prob- 
lems when estimating prediction methods (Rost and Sander 
1993; Rost et al. 1995, 1996b; Rost 2002). (3) Neither 
Moller et al. nor Ikeda et al. tested any method based on 
alignment information, although such methods are known to 
be more accurate (Rost and Sander 1993; Persson and Argos 
1994; Neuwald et al. 1995; Rost et al. 1995; Rost 1996; 
Johnson and Church 1999). (4) No group explored per- 
residue — along with per-segment — based measures for pre- 
diction accuracy. Instead, all groups focused on one par- 
ticular definition of prediction accuracy; no two groups ap- 
plied the same definition. (5) No group established levels 
for significant differences between methods. This makes it 
impossible to conclude whether or not differences between 
any two methods are relevant. In general, levels of signifi- 
cant differences typically depend on the data sets and the 
scores used (Eyrich et al. 2001; Rost and Eyrich 2001; 
Marti-Rcnom et al. 2002). (6) Only Moller and coworkers 
tested proteins with signal peptides; however, their analysis 
was restricted to a small set of 34 proteins with known 
signal peptides. (7) No group analyzed more than 14 pre- 
diction methods. (8) Generally, prediction accuracy differs 
significantly between proteins used to develop a method and 
proteins never seen by a method (Moult et al. 1995, 1997, 
1999). For membrane proteins, this effect is very difficult to 
estimate because few high-resolution structures of mem- 
brane proteins are added over a course of a year. Although 
Moller et al. tried to estimate this effect by analyzing only 
proteins not used for developing a method, they did not rule 
out that the proteins tested in the category "not known to the 
method" were similar to proteins used for development. 
Surprisingly, Moller et al. found most methods to perform 
better on proteins not used for development. Given how 
prediction methods are developed, it is very unlikely that 
this result holds in general. Either the differences are not sig- 
nificant, or the data sets were not representative (or both). 



To resolve these limitations and to standardize membrane 
helix prediction performance comparisons, we have pre- 
sented an analysis that distinguished between performance 
on redundancy-reduced high- and low-resolution data sets, 
established thresholds for significant differences in perfor- 
mance by introducing a bootstrap experiment, and imple- 
mented both per-segment and per-residue analysis of mem- 
brane helix predictions. Additionally, we analyzed more 
methods (8 publicly available advanced prediction methods 
and 19 different hydrophobicity scales). In particular, we 
included alignment-based prediction methods. Furthermore, 
we tested membrane helix prediction methods on a large, 
representative set of 1418 unique signal peptides and 616 
unique globular protein folds taken from SCOP (Lo Conte 
et al. 2002). Although we confirmed many previous find- 
ings, overall our results differed greatly in detail from pre- 
vious publications. 

Results 

Accuracy in predicting membrane helices 

Prediction methods not significantly less accurate than low- 
resolution experiments! We compared the membrane anno- 
tations for 13 proteins for which we had both low-resolution 
and high-resolution data available. Whereas ~94%-96% of 
the helices agreed between the two experimental methods, 
for only 11 of the 13 proteins did all helices overlap be- 
tween the two experimental methods (Table 1). Also, the 
two methods agreed on only 82% of all residue assignments 
(Table 1 , Q 2 , percentage of correctly predicted residues in 
two states: membrane helix and non-membrane helix). A 
detailed comparison of the percentage of identically as- 
signed membrane-helical residues confirmed that for most 
cases, the differences arose from the longer segments ob- 
served in the high-resolution data (e 2T %obs < Q 2T %prd > 
where Q 2 T %obs * s tne percentage of all observed TMH helix 
residues that are correctly predicted and Q 2 T %pTd is tne per- 
centage of all predicted TMH helix residues that are cor- 
rectly predicted). Assuming that the high-resolution data 
were correct, we can interpret the low-resolution data as an 
experimental prediction of transmembrane helices. Surpris- 
ingly, most prediction methods performed as well as the 
low-resolution experiments (Table 1). In fact, in terms of 
almost all measures for accuracy, we could find one method 
that numerically agreed more with the high-resolution data 
than the low-resolution experiment. However, given the 
small size of the data set, this statement ignored the error 
margins in the estimate for accuracy. 

Simple hydrophobicity-based predictions were less accu- 
rate than advanced methods. Of the methods that only used 
hydrophobicity scales for prediction, none detected all 
membrane helices correctly for >70% of the high-resolution 
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Table 1. Accuracy of low-resolution experiments and predictions 



Per-segment accuracy 1 * Per-residue accuracy 0 
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a Methods: see abbreviations at begin of article. 

b Per-segment accuracy: g ok gives the percentage of proteins for which all TM helices are predicted correctly 
(eq. 4), £him %obs the percentage of all observed helices that are correctly predicted (eq. 2), 0 h tm %prd is tne 
percentage of all predicted helices that are correctly predicted (eq. 3), TOPO the percentage of proteins for which 
the topology (orientation of helices) is correctly predicted (eq. 4, not: empty for methods that do not predict 
topology). 

c Per-residue accuracy: Q 2 is the percentage of correctly predicted residues in two-states: membrane helix/ 
nonmembrane helix (eq. 6), <2 2T %obs the percentage of all observed TMH helix residues that are correctly 
predicted (eq. 7), Q 2T %prd the percentage of all predicted TMH helix residues that are correctly predicted (eq. 8), 
g 2 N 9h ° b8 lne percentage of all observed non-TMH helix residues that are correctly predicted, and Q 2N %pni the 
percentage of all predicted non-TMH helix residues that are correctly predicted. 
Note of caution: this data set of 13 proteins was too small to rank the prediction methods in any way! 
Data set: 13 high-resolution membrane helical proteins from PDB for which we found low-resolution experi- 
mental information in old versions of SWISS-PROT (labeled by LOW-RES). Note that the topology assessment 
was based on only 8 of the 13 proteins for which we had this information. 

d ERROR: The estimates for per-segment accuracy resulted from a bootstrap experiment with M = 100 and K 
= 6 (Fig. 5); the estimates for per-residue accuracy were obtained according to equation 1 1 . 
c Numbers in italics: 2 standard deviations difference from baseline LOW-RES. 



proteins (Table 2, Q ok , percentage of proteins for which all 
TM helices are predicted correctly). However, most meth- 
ods correctly identified >90% of all observed membrane 
helices (Table 2, Q hlm %obs , percentage of all observed heli- 
ces that are predicted correctly). In fact, measured by this 
score alone, most simple hydrophobicity-based methods ap- 
peared more accurate than many advanced prediction meth- 
ods, but this success was achieved by overpredicting mem- 
brane helices (Table 2, Q htm %prd < Q htm %ob \ where 
Ghtm %prd is me percentage of all predicted helices that are 
predicted correctly). Encouragingly, >80% of the helices 
predicted by most methods were correct (Table 2, Q h tm %prd )- 
Unfortunately, the real problem with the simple methods 
was that they did not correctly predict the nonmembrane 
regions as apparent in levels of <70% correctly predicted 
residues (Table 2, £2 2 ). Note that we implemented all simple 
hydrophobicity scales by using the algorithm proposed by 
the White group (Jayasinghe et al. 2001a). To ensure that 
this optimized or at least did not penalize membrane protein 
prediction for some hydrophobicity scales, we also tested 
the thresholds suggested in the original publications for the 
GES (hydrophobicity property; Engelman et al. 1986; Prab- 
hakaran 1990) and KD scales (Kyte and Doolittle 1982). 



Interestingly, the originally proposed thresholds decreased 
prediction accuracy (Supplementary Table 1; available on- 
line at http://www.proteinscience.org). 

Most advanced predictions were correct. All advanced 
prediction methods correctly identified all helices for most 
high-resolution proteins (Table 2, Q ok ). In contrast, the only 
two methods we found to also accurately predict the orien- 
tation of the helices, that is, the topology, most often were 
TopPred2 (hydrophobicity-based membrane helix predic- 
tion) and HMMTOP2 (Table 2, TOPO, percentage of pro- 
teins for which the topology is correctly predicted). Note 
that HMMTOP2 was developed using all the 36 high-reso- 
lution chains for which we compiled the results. On the 
other hand, TopPred2 used only four of the 36 chains when 
it was developed. All methods tested correctly predicted 
>70% of the residues in either of the two states, TMH (T) 
and non-TMH (N, Table 2, Q 2 ). However, all methods sig- 
nificantly underpredicted residues in membrane helices 
(Table 2, G 2T %obs < G 2T %prd ). 

No single advanced method best by all scores. The set of 
36 high-resolution proteins was small enough to require 
extreme caution in ranking methods based on numerical 
differences. When comparing pairwise ranks of the methods 
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Table 2. Accuracy of prediction methods for high-resolution set 



Per-segment accuracy Per-residue accuracy 



Method 




s\ %oh» 
V£htm 


r\ %prd 


TT\v\r\ 

lOrO 


Qi 


Sin 


Qrx p 


/% %obs 
V£2N 




ERROR 


±10 


±8 


±10 


±9 


±3 


±1 


±8 


±6 


±6 


DAS 


79 


99 


96 




72 


48 


94 


96 


62 


HMMTOP2 


83 


99 


99 


61 


80 


69 


89 


88 


71 


PHDhtm08 


64 


77 


76 


54 


78 


76 


82 


84 


79 


PHDhtm07 


69 


83 


81 


50 


78 


76 


82 


84 


79 


PHDpsihtm08 


84 


99 


98 


66 


80 


76 


83 


86 


80 


PRED-TMR 


61 


84 


90 




76 


58 


85 


94 


66 


SOSUI 


71 


88 


86 




75 


66 


74 


80 


69 


TMHMMI 


71 


90 


90 


45 


80 


68 


81 


89 


72 


TopPred2 


75 


90 


90 


54 


77 


64 


83 


90 


69 


KD 


65 


94 


89 




67 


79 


66 


52 


67 


GES 


64 


97 


90 




71 


74 


72 


66 


69 


Ben-Tal 


60 


79 


89 




72 


53 


80 


95 


63 


Eisenberg 


58 


95 


89 




69 


11 


68 


57 


68 


Hopp-Woods 


56 


93 


86 




62 


80 


61 


43 


67 


WW 


54 


95 


91 




71 


71 


72 


67 


67 


Av-Cid 


52 


93 


83 




60 


83 


58 


39 


12 


Roseman 


52 


94 


83 




58 


83 


58 


34 


66 


Levitt 


48 


91 


84 




59 


80 


58 


38 


67 


A-Cid 


47 


95 


83 




58 


80 


56 


37 


66 


Heijne 


45 


93 


82 




61 


85 


58 


34 


64 


BulUBreese 


45 


92 


82 




55 


85 


55 


27 


66 


Sweet 


43 


90 


83 




63 


83 


60 


43 


69 


Radzicka 


40 


93 


79 




56 


85 


55 


26 


63 


Nakashima 


39 


88 


83 




60 


84 


58" 


36 


63 


Fauchere 


36 


92 


80 




56 


84 


56 


31 


65 


Lawson 


33 


86 


79 




55 


84 


54 


27 


63 


EM 


31 


92 


77 




57 


85 


55 


28 


64 


Wolfenden 


28 


43 


62 




62 


28 


56 


97 


56 



Data set: 36 high-resolution membrane helical proteins from PDB; Note: We had reliable information 
about topology for only 35 of the 36 proteins. 
Abbreviations as in Table 1. 

Methods, hydrophobicity scales: see the abbreviations footnote at the beginning of the article for the 
advanced methods, and the list of hydrophobicity scales in the Materials and Methods section for the 
hydrophobicity scales. The advanced methods are sorted by alphabet, the simple hydrophobicity- 
based methods according to the g ok score. 

ERROR: the estimates for per-segment accuracy resulted from a bootstrap experiment with M = 100 
and K = 18 (Fig. 5); the estimates for per-residue accuracy were obtained according to equation 1 1. 
Numbers in italics: two standard deviations below the numerically highest value in each column. 
Note of caution: all methods are tested on the same set of proteins. However, the numbers are not 
from a cross-validation experiment, that is. some methods may have used some of the proteins for 
training. Generally, newer methods are more likely to be overestimated than older ones. 



according to various scores, we found that no advanced 
method performed consistently best, and none consistently 
worst (Fig. 1). Interestingly, TMHMMI and TopPred2 ap- 
peared to be the most representative methods in that the 
scores for these methods were most often indistinguishable 
from all other advanced methods in pairwise comparisons. 
In contrast, DAS appeared to be most unique in that it was 
often better and often worse than all other methods. Three 
methods were clearly more often worse than better: WW (5 
times better/30 times worse), PRED-TMR (6/23), and 
SOSUI (7/26). Three methods were clearly more often bet- 
ter than worse: HMMTOP2 (21 times better/ 1 time worse), 
PHDpsihtm08 (divergent profile-based neural network pre- 



diction of transmembrane helices) (27/2), and PHDhtm08 
(20/6). 

Performance on low-resolution data set: distinct differ- 
ences. The low-resolution set was considerably larger (165 
proteins) than the high-resolution set (36 chains). Neverthe- 
less, we could still not find any method that performed 
consistently better than all the others (Table 3). Most meth- 
ods reached better per-segment scores for the high- than for 
the low-resolution data. The opposite was the case for per- 
residue scores as they were consistently higher for the low- 
resolution proteins. Most surprising may be the significant 
differences between the two data sets in terms of the per- 
centage of proteins for which all helices were correctly pre- 
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Overall statistics: number of pair comparisons (total 64 for 9 methods und 8 scores) 

DAS 5 15 27 6 11 PHDpsihtm08 11 16 35 2 0 TMHMM1 3 5 46 10 0 

IIMMT0P2 6 15 42 I 0 PREP-XMR 2 4 35 18 5 TopPred2 3 7 44 10 0 

PHDhlm08 4 16 38 5 1 SQSUI 2 5 31 24 2 WW 1 4 29 11 19 



Fig. 1. Pairwise comparison of methods. For ali high-resolution results compiled in Table 2, we show the pairwise comparison for eight 
different scores and nine methods. Differences by more than one (two) standard error(s) are marked by one (two) arrow(s). Empty 
boxes indicate that the difference between the respective scores of the two methods is not significant. For example, DAS is two standard 
errors belter than WW in terms of the number of correctly predicted proteins (Q ok ), whereas HMMTOP2 is two standard errors belter 
than DAS in terms of the overall per-residue accuracy (Q 2 ). The lower table summarizes ihe respective counts of pair-comparisons for 
which a particular method is better or worse than the others. TopPred2 and TMHMM1 appear 10 be the most neutral method (44 and 
46 times indistinguishable), whereas DAS seems the most unique meihod in that it is often belier than the others and equally oflen 
worse. Note: only DAS, PHDhtm08, PHDpsihtm07, and TopPred2 did noi use most of the proieins tested to optimize prediction 
accuracy; thus, the results for all the other methods are likely to be overestimates. 



dieted for the old methods DAS and TopPred2 (Q ok in 
Tables 2 and 3). Even more stunning was the extremely 
poor performance of most simple methods using only hy- 
drophobicity scales for the prediction. Interestingly, for the 
hydrophobicity scales, the two newest ones (WW and Ben- 
Tal; hydrophobicity scale representing the free energy of 
transferring an amino acid from water into the center of the 
hydrocarbon region of a lipid bilayer) performed best over- 
all on the data from low-resolution experiments. 

Most errors were under- or overpredictions of one TMH. 
The good news was that all methods predicted the number 
of membrane helices correctly for most proteins (Fig. 2). 
However, this number differed significantly between the 
high- (71%) and the low-resolution data (56%). The major- 
ity of deviations were to predict one helix too few or one too 
many (68% for high; 64% for low-resolution, Fig. 2, cen- 
ter). Interestingly, the errors were rather symmetric for the 
low-resolution set, whereas they were substantially asym- 
metric for the high-resolution data. We could not find any 
significant correlation between the number of membrane 
helices and the errors of a particular method (data not 
shown). However, this may be largely owing to the few 
high-resolution structures in our data set. 



Accuracy lower for proteins with more than five TMH's. 
For proteins with five or fewer membrane helices, the av- 
erage over all advanced methods exceeded 80% (g ok , eq. 4) 
for the high-resolution data and 60% for the low-resolution 
data (Fig. 3). However, prediction accuracy dropped signifi- 
cantly for proteins with more than five helices to values 
from 33%-36% (Fig. 3). Why are proteins with less than 
five TMH's so different from proteins with more than six 
TMH's? Answers to this question remain speculative. 

Most proteins and most helices correctly predicted by one 
of the methods. None of the high-resolution helices has been 
consistently mispredicted by all programs. However, this 
may reflect that the more recent methods used all these 
proteins for training. In contrast, three transmembrane he- 
lices from three proteins of the low-resolution set were not 
identified by any of the methods: (1) The C4-dicarboxylate 
transport protein from Rhizobium meliloti (SWISS-PROT 
ID dcta_rhime; helix from residues 282-300, sequence 
ALPGLMNKMEKAGCKRSVV) has a relatively hydro- 
phobic sequence, but it has a polar stretch of residues, 
NKMEK, in the middle of the helix. The gene fusion con- 
structs were not always created with the reporter gene pres- 
ent in the predicted loop regions (Jording and Puhler 1993). 
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Table 3. Accuracy of prediction methods for low-resolution set 



Per-segment accuracy Per-residue accuracy 



ivicinuu 




r\ %obs 


s\ %prd 
I* him 


1 Uru 




s\ %ohs 


r\ 9bprd 


r\ **>obs 
WI2N 


r\ %prd 


ERROR 


±9 


±5 


±5 


±9 


±2 


±4 


±4 


±2 


±2 


DAS 


39 


93 


81 




86 


65 


85 


97 


84 


HMMTOP2 


66 


94 


93 


79 


90 


85 


83 


91 


91 


PHDhtm08 


57 


86 


86 


68 


87 


83 


75 


90 


94 


PHDhtm07 


56 


85 


86 


72 


87 


83 


75 


90 


94 


PHDpsiHtm08 


67 


95 


94 


67 


89 


87 


77 


92 


96 


PRED-TMR 


58 


92 


93 




90 


78 


86 


94 


89 


SOSUI 


49 


88 


86 




88 


79 


72 


88 


90 


TMHMMI 


72 


91 


92 


85 


90 


83 


80 


91 


92 


TopPred2 


48 


84 


79 


59 


88 


74 


71 


93 


89 


Ben-Tal 


35 


79 


90 




87 


67 


83 


95 


85 


Wolfenden 


29 


56 


82 




80 


47 


76 


97 


79 


WW 


27 


90 


75 




81 


83 


59 


77 


89 


GES 


23 


93 


68 




78 


87 


53 


72 


91 


Eisenberg 


20 


90 


63 




72 


89 


47 


63 


91 


KD 


13 


88 


59 




63 


91 


42 


50 


91 


Heijne 


11 


89 


55 




51 


91 


35 


33 


89 


Hopp-Woods 


11 


87 


58 




54 


90 


36 


38 


88 


Sweet 


11 


87 


59 




58 


88 


38 


44 


87 


Av-Cid 


10 


87 


58 




53 


89 


36 


38 


. 87 


Rosernan 


9 


89 


56 




48 


91 


34 


30 


88 


Levitt 


9 


88 


56 




49 


91 


35 


32 


88 


Nakashima 


9 


88 


56 




50 


90 


35 


34 


87 


A-Cid 


8 


87 


57 




52 


89 


35 


36 


87 


Lawson 


8 


86 


57 




43 


89 


32 


24 


83 


Radzicka 


6 


87 


56 




41 


91 


32 


21 


85 


Bull-Breese 


6 


86 


56 




40 


91 


32 


20 


83 


EM 


5 


89 


56 




41 


91 


32 


21 


85 


Fauchere 


5 


87 


56 




43 


91 


33 


23 


86 



Data set: 165 low-resolution membrane helical proteins from SWISS-PROT (Moller et al. 2000). 
Note: We had reliable information about topology only for 140 of the 165 proteins. 
Abbreviations as in Table 2. The advanced methods are sorted by alphabet, the simple hydropho- 
bicity-based methods according to the Q ok score. 

Numbers in italics: two standard deviations below the numerically highest value in each column. 
Note of caution: all methods are tested on the same set of proteins. However, the numbers are not 
from a cross-validation experiment, that is, some methods may have used some of the proteins for 
training. Generally, newer methods are more likely to be overestimated than older ones. In particular, 
DAS. the PHD methods, and TopPred2 used only a small subset of these proteins for setting up the 
method, whereas HMMTOP2 used most. 



In some cases, the reporter gene was present in the predicted 
membrane regions. This is a problem because it may alter 
the topological placement of the reporter gene with respect 
to the membrane. In addition, gene fusion constructs were 
not made for each loop region because reporter genes were 
introduced at random. Hence, each loop was not tested, 
which included loops for helix 282-300, for its topological 
placement. Hence, the experimental evidence for this mem- 
brane helix (282-300) was weak, at best. (2) The Haemo- 
lysin Secretion ATP-Binding Protein (HlyB) from Esche- 
richia coli (hlyb_ecoli, residues 38-51, sequence GTGL 
GLTSWLLAAK) is an integral membrane protein. How- 
ever, the particular membrane helix missed appears very 
short. The other seven membrane helices of HlyB are at 
least 20 residues long. However, some authors have claimed 



that membrane-spanning helices may be as short as 10 resi- 
dues long (Lewis et al. 1990). The experimental evidence 
for hylb_ecoli had similar problems as that for dcta_rhime: 
The experimentalists found it difficult to identify mem- 
brane-spanning regions through predictions (Wang et al. 
1991). This was caused by the high proportion of hydro- 
philic residues in the N-terminal portion of HlyB. Conse- 
quently, the authors did not know where to insert their re- 
porter gene, which in this case was p-lactamase. Thus, they 
randomly inserted the reporter gene. Additionally, topolog- 
ical models identify the short stretch as loop (Wang et al. 
1991; Gentschev and Goebel 1992). (3) Like all other prob- 
lematic cases, the Mitochondrial brown fat uncoupling pro- 
tein 1 from Rattus norvegicus (ucpl_rat, residues 178-194, 
sequence PNLMRNVIINCTELVTY) has transmembrane 
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Fig. 2. Over- and underprediction of membrane helices. All methods (top panel): For all methods and all proteins in the high- and 
low-resolution sets, the difference between the number of membrane helices predicted and observed is shown. Although the two 
distributions appear rather similar, the higher symmetry in the low-resolution graph hid that the percentages with no difference were 
quite different: 71% for the high-resolution data and 56% for the low-resolution data. The inset (center) underlined the observation that 
the majority of errors were due to under- or overpredicting one helix. 



regions that contain many polar residues. For this protein, 
the experimentalists stated that their data did not suffice to 
strongly conclude that residues 178-194 are in a membrane 
helix (Miroux et al. 1993). 

No significant difference in performance for prokaryotic 
and eukaryotic proteins. We compared the performance of 
each method for eukaryotic and prokaryotic proteins. Most 
methods did not consistently perform better for both the 
high- and low-resolution data (Table 4, AQ ok ). In fact, the 
trends differed greatly between both data sets, and for dif- 
ferent measures of prediction accuracy. Whereas prokary- 
otic proteins were predicted more accurately in terms of 
per-segment measures for the high-resolution data sets, the 



opposite was the case for most methods when compared on 
the low-resolution set. Only four methods had a similar 
trend in Q ok : PRED-TMR predicted eukaryotic proteins 
more accurately; SOSUI, TopPred2, and WW predicted 
prokaryotic proteins more accurately for both sets. How- 
ever, none of the values exceeded two times the estimated 
error, that is, none was statistically very significant. All 
methods predicted topology (ATOPO) better for the pro- 
karyotic proteins in the high-resolution set and for the eu- 
karyotic proteins in the low-resolution set. When measuring 
prediction accuracy in terms of per-residue performance 
(AQ 2 ), we could not find any significant difference between 
prokaryotic and eukaryotic proteins; all methods did slightly 





1 

□ Htgh-nesoMion 

□ low-re solution 




1 2-5 6*1 Z 

Number of transmembrane netlces 

Fig. 3. Proteins with many helices predicted less accurately. We binned 
the results for all advanced methods according to the number of observed 
membrane helices such that the three classes contained similar numbers of 
proteins (X-axis). Accuracy (V-axis) is measured in terms of the percentage 
of proteins for which ail helices are correctly predicted (Q ak ). Both, for the 
high- and the low-resolution data, proteins with more than five membrane 
helices were predicted at significantly lower levels of accuracy. 



Table 4. Difference between eukaryotic and prokaryotic 
membrane proteins 

Difference in accuracy eukaryotes vs. prokaryotes 



High-resolution 



Low-resolution 



Method 




ATOPO 


A<2 2 




ATOPO 


A<2 2 


ERROR 


±14 


±12 


±20 


±18 


±6 


±18 


DAS 


4 




4 


-16 




8 


HMMTOP2 


-9 


-31 


2 


13 


6 


9 


PHDhtm08 


-24 


-14 


3 


10 


39 


10 


PHDhtm07 


-11 


-6 


3 


10 


39 


10 


PHDpsiHtm08 


-20 


-32 


3 


13 


32 


8 


PRED-TMR 


5 




5 


11 




7 


SOSUI 


-8 




1 


-18 




5 


TMHMM1 


-20 


-39 


0 


6 


12 


5 


TopPred2 


-12 


-18 


0 


-12 


-32 


7 


WW 


-6 




4 


-12 




5 



Data set: eukaryotic proteins: 19 in high-resolution set, 73 in low-resolu- 
tion set; prokaryotic proteins: 17 in high-resolution set, 87 in low-resolu- 
tion set. 

Accuracy: levels of accuracy given are the differences in the averages over 
all eukaryotic proteins minus the averages over all prokaryotic proteins. 
Number in italics: values that are ± two standard deviations from a differ- 
ence of 0. 
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better for eukaryotic proteins for both high- and low-reso- 
lution data. Nevertheless, because of the lack of consistent 
direction of the difference and the lack of statistical signifi- 
cance, our data did not support the previously published 
conclusion that either prokaryotic or eukaryotic proteins 
were predicted more accurately. 

Accuracy of distinguishing between membrane and 
other proteins 

Few false positives: best methods found few membrane he- 
lices in globular proteins. Most advanced methods correctly 
distinguished between membrane and globular proteins 
(Table 5). The best methods confused between the two types 
of proteins for <4% of all globular proteins tested (Table 5). 
DAS had the highest error rate of the advanced methods 
(16% false positives), which was surprising given that DAS 

Table 5. Confusion of membrane and globular proteins 



False negatives (%) 



Method 


False positives (%) 


High-resolution 


Low-resolution 


ERROR 


±2 


±9 


±3 


SOSUI 


1 


8 


4 


TMHMM1 


1 


8 


4 


Wolfenden 


2 


39 


13 


PHDpsihtm 


2 


3 


8 


PHDhtm08 


2 


19 


23 


Ben-Tai 


3 


11 


4 


PHDhtm07 


3 


14 


16 


PRED-TMR 


4 


8 


1 


HMMTOP2 


6 


0 


1 


TopPred2 


JO 


8 


. // 


DAS 


16 


0 


0 


WW 


32 


0 


0 


GES 


53 


0 


0 


Eisenberg 


66 


0 


0 


KD 


81 


0 


0 


Sweet 


84 


0 


0 


Hopp-Woods 


89 


0 


0 


Nakashima 


90 


0 


0 


Heijne 


92 


0 


0 


Levitt 


93 


0 


0 


Roseman 


95 


0 


0 


A-Cid 


95 


0 


0 


Av-Cid 


95 


0 


0 


Lawson 


98 


0 


0 


FM 


99 


0 


0 


Fauchere 


99 


0 


0 


Bull-Breese 


100 


0 


0 


Radzicka 


100 


0 


0 



Data set: 616 high-resolution globular proteins from PDB (for false posi- 
tives, i.e., the test whether or not the methods incorrectly predict membrane 
helices in globular proteins). The membrane sets are identical to those 
given in Table 2 (high-resolution) and Table 3 (low-resolution). 
Methods are sorted by the accuracy in correctly rejecting globular protein 
(false positives). 

Numbers in italics: two standard errors below the lowest confusion rate. 



tended to underpredict residues in membrane helices. In 
contrast to the advanced methods, the simple methods dis- 
tinguished only poorly between membrane and globular 
proteins. The two exceptions were the old scale from 
Wolfenden (hydration potential; Wolfenden et al. 1981) and 
the new one from Ben-Tal (Kessel and Ben-Tal 2002). The 
latter also predicted membrane proteins rather accurately 
(Tables 2 and 3). However, most methods found helices in 
>90% of all the globular proteins. 

Few false negatives. Most methods find all membrane 
proteins. Although most hydrophobicity scales detected 
membrane helices in >90% of the globular proteins, they 
detected all membrane proteins as such. The exceptions 
were the two scales that were best in rejecting globular 
proteins; Wolfenden and Ben-Tal (Table 5). Similarly, 
PHDhtm08 misclassified only 2% of the globular proteins, 
but also missed -20% of the membrane proteins. The 
only methods that misclassified <10% of the globular 
proteins and overlooked <10% of the membrane proteins 
were: SOSUI, TMHMM1, PHDpsihtm, PRED-TMR, and 
HMMTOP2 (Table 5). 

Signal peptides falsely predicted to be membrane helices 
by most methods. Even the advanced methods had high error 
rates for signal peptides (Table 6). In fact, one of the most 
accurate rejections of signal peptides was achieved by the 
simple method solely using the Wolfenden (Wolfenden et 
al. 1979) hydrophobicity scale (26% errors). Many of the 
false predictions were at the very beginning of the respec- 
tive secreted proteins. Thus, we tested the following simple 
expert rule: delete all membrane helices predicted between 
5 and 10 residues after an N-terminal methionine. For 
PHDpsihtm08, this reduced the falsely predicted signal pep- 
tides from 322 (23%) to 146 (10%). Encouragingly, when 
we applied the same rule to the set of membrane proteins, no 
helix was removed by this rule. For three out of the 1418 
signal peptides, PHDpsihtm08 incorrectly predicted two 
transmembrane helices. 

Discussion 

Confirm in g previous analyses 

Some methods correctly distinguish globular from helical 
membrane proteins. Previous analyses showed that simple 
hydrophobicity-based methods have problems distinguish- 
ing between helical transmembrane and globular proteins 
(Edelman 1993; Jones et al. 1994; Rost et al. 1995; Jayas- 
inghe et al. 2001a; Moller et al. 2001). In general, we con- 
firmed this finding (Table 5). However, the Wolfenden and 
the Ben-Tal scales were clearly exceptional in this respect. 
Both performed on a par with the best advanced methods 
that predict membrane helices in at most 3% of all globular 
proteins (Table 5). Interestingly, these levels of accuracy are 
similar to the performance of the same methods six years 
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Table 6. Incorrectly predicted membrane helices in signal 
peptides (false positives) 



Percentage of proteins 
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Heijne 
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Levitt 
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99 


A-Cid 


99 


Av-Cid 


99 


Lawson 


99 


EM 


99 


Fauchere 


99 


Bull-Breese 


99 


Radzicka 


99 



Data set: 1418 sequence unique signal peptides from http://www.cbs.dtu. 
dk/ftp/signalp/ collected by Nielsen and colleagues (Nielsen et ai. 1996, 
1997a,b). 

Numbers in italics: two standard deviations below the lowest false-positive 
rate. 

ago (Rost et al. 1996a,b). This finding confirms that the 
globular proteins added to PDB over the last decade are not 
radically different from the structures that we knew before 
(Rost and Sander 1993; Rost 2001). Moller and colleagues 
published significantly more pessimistic estimates for the 
confusion between globular and membrane proteins (Moller 
et al. 2001). Whereas our estimates were based entirely on 
proteins of known structure, those from Moller et al. were 
based on proteins of unknown structure. Thus, we see two 
possible reasons for the difference between the two esti- 
mates. (1) Proteins in PDB differ from proteins in SWISS- 
PROT in their average length by almost a factor of 2 be- 
cause structural biologists often have to truncate the pro- 
teins to obtain high-resolution structures. We might argue 
that the truncated regions are more likely to be confused 
with membrane helices than the regions for which structure 
is determined. (2) Many of the proteins used by Moller and 
colleagues may, in fact, contain membrane helices or signal 
peptides (for which the error is higher, Table 6). We suspect 



that the truth lies somewhere between the two extremes. 
Hence, our estimates for the confusion between globular 
and membrane proteins may be slightly optimistic. 

Most methods confuse signal peptides and membrane he- 
lices. Moller et al. tested prediction methods on 34 signal 
and target peptides. They found that most methods incor- 
rectly predicted these regions to contain membrane helices. 
We tested all 27 methods on 1418 sequence-unique signal 
peptides. Our results confirmed the previously uncovered 
trends (Table 6). However, the larger set that we used re- 
vealed that TMHMM1, which is one of the best methods in 
this respect, confuses >30% of the signal peptides with 
membrane helices rather than <10% as previously estimated 
(Moller et al. 2001). Most simple methods based only on 
hydrophobicity scales confused >90% of all the signal pep- 
tides with membrane helices (exception: Wolfenden scale, 
Table 6). The good news was that the error could be reduced 
by experts who discard all membrane helices predicted 
closer than 10 residues to an N-terminal methionine. In this 
best-case scenario, PHDhtm and PHDpsihtm falsely pre- 
dicted only -10% of the signal peptides as membrane heli- 
ces. Possibly, combinations of membrane-optimized and 
signal-peptide-optimized programs could reduce this error 
rate. 

Most methods identify most membrane helices. We con- 
firmed (Ikeda et al. 2001 ; Jayasinghe et al. 2001b; Moller et 
al. 2001) that many methods correctly predict most mem- 
brane helices (Fig. 2). We also found the most common 
mistake to be the under- or overprediction of a single trans- 
membrane helix. However, our results differed in detail 
from previous analyses (see below). 

Resolving differences in previous analyses 

Some methods are better; none is clearly best. Evaluations 
of membrane prediction methods are sometimes based on 
different definitions for performance accuracy. A particular 
example of the latter is to count a prediction of one long 
helix as correct although it stretches over two observed 
helices and thus misses the break in between the two. An- 
other misleading standard procedure is to only report values 
covering one side of the coin, that is, only the values of 
correctly predicted as percentage of observed or vice versa. 
Here, we carefully evaluated all methods on identical data 
sets and compiled all reasonable scores for prediction ac- 
curacy. To simplify the complexity, we focused in our re- 
port on a relatively limited number of scores. Another prob- 
lem with many previous analyses is that investigators have 
not estimated the error associated with a particular score. 
For example, from Table 1 we may conclude that 
HMMTOP2 is much better than TopPred2 when applying 
any measure for prediction accuracy. Although the numbers 
differed greatly, a thorough bootstrap experiment revealed 
that the performance of the two methods was indeed indis- 
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tinguishable. We compared the methods in a pairwise man- 
ner for each score of the high-resolution data set (Fig. 1). 
Some methods appeared more accurate than others. How- 
ever, no method(s) performed consistently better than all 
others by more than one standard error (Fig. 1). Our esti- 
mates of error margins explained the numerical differences 
found between three analyses (Ikeda et al. 2001; Jayasinghe 
et al. 2001b; Moller et al. 2001). 

Simple hydrophobicity-based methods less accurate than 
advanced methods. Moller et al. (2001) suggested that 
simple hydrophobicity scale-based methods predict mem- 
brane helices almost as accurately as the best advanced 
methods. We could not confirm this proposition. In contrast, 
we found that the best advanced methods were significantly 
more accurate than the best hydrophobicity-scale based 
methods, both in terms of per-segment and per-residue ac- 
curacy (Tables 2 and 3). The only possible exception may 
be the per-residue performance of the Ben-Tal scale for the 
low-resolution data (Table 3). However, we did confirm 
that, because of overprediction, a few hydrophobicity-scale- 
based methods identify the observed membrane helices at a 
level of accuracy similar to that of advanced methods in 
Ghtm % ° bs in Tables 2 and 3. Jayasinghe et al. found that the 
WW hydrophobicity scale-based method that they intro- 
duced outperformed even the best advanced methods ("We 
find that [the] WW scale ... identifies TM helices of mem- 
brane proteins with an accuracy greater than 99%"; Jayas- 
inghe et al. 2001a). We could also not confirm this finding, 
no matter which definition of prediction accuracy we com- 
pared. Nevertheless, the major problem with simple hydro- 
phobicity-based methods is their failure on globular proteins 
(Table 5) and signal peptides (Table 6). In fact, the error of 
hydrophobicity scales depends on the length of the protein. 
For example, the high-resolution chains had an average 
length of -215 residues, whereas low-resolution proteins 
were, on average, -420 residues long. Although hydropho- 
bicity scales correctly predicted all helices in 28%-65% of 
the short proteins (Table 2), they only detected 5%-29% for 
the long proteins (Table 3). In particular, the scale that 
performed best on the high-resolution set (KD) dropped in 
accuracy from 65% (high) to 13% (low), whereas the scale 
that performed most poorly on the short proteins in the 
high-resolution data (Wolfenden) became best for the long 
proteins in the low-resolution data. The Wolfenden scale 
also performed relatively well on globular proteins (Table 
5) and on signal peptides (Table 6). The price for the lack of 
overprediction is a low accuracy in detecting membrane 
helices (underprediction). Overall, the most successful hy- 
drophobicity scale appeared to be the Ben-Tal scale, which 
is based on the free energy of transferring an amino acid 
from water into the center of the hydrocarbon region of a 
lipid bilayer (Kessel and Ben-Tal 2002). It out-performed 
the Wolfenden scale for membrane proteins and for globular 
proteins, and it bested all other scales for the low-resolution 



set. Simple hydrophobicity scales obviously have tremen- 
dous importance for sequence analysis. However, to use 
them as the only criterion to predict membrane helices ap- 
pears to be a bad idea. 

Incorrect ranking by per-segment accuracy depends on 
definition of score. As discussed above, any attempt to rank 
prediction methods should account for the standard error in 
the estimated level of accuracy. A particular illustration of 
this finding is that different definitions of the accuracy in 
correctly predicting all helices (eq. 4) would slightly alter 
the ranks. For example, DAS scored worst among all ad- 
vanced methods when an overlap of at least nine residues 
was required to consider a helix correctly predicted (defi- 
nition introduced by Moller et al. 2001), but it appeared to 
be the third-best of all advanced methods when we applied 
the definition introduced by Ikeda et al. (2001) (see Supple- 
mentary Table I; available online at http://www.protein- 
science.org). When giving different ranks only for signifi- 
cant differences, this apparent contradiction was resolved. 
Most averages were relatively insensitive to whether we 
required an overlap of 3 or 9 residues between predicted and 
observed helix (Q ok 3 and Q ok 9 in Supplementary Table 1; 
available online at http://www.proteinscience.org). How- 
ever, contrary to what has been claimed previously, some 
methods had lower averages when requiring nine overlap- 
ping residues. Similarly, for most methods the average 
scores did not change considerably when using the defini- 
tion of Ikeda et al (Q ok \ 1 Centre in Supplementary Table I; 
available online at http://www.proteinscience.org). How- 
ever, although the score was lower for most methods for 
which it differed from the other two, for a few it was actu- 
ally higher. These were methods that tended to underpredict 
helices. Overall, the dependence of ranking on the definition 
of the score used underscored the need to standardize evalu- 
ations. 

Similar prediction accuracy for prokaryotic and eukary- 
otic membrane proteins. Ikeda et al. (2001) found that 
prediction methods are consistently worse at predicting 
membrane proteins from eukaryotes than those from pro- 
karyotes. We could not verify this finding. Both for the 
high- and for the low-resolution data sets, we found that 
some methods reached slightly higher levels on one than on 
the other (Table 4). However, the differences were not sig- 
nificant. 

Novel findings 

Low-resolution experiments not much more accurate than 
prediction methods. The low-resolution experiments dif- 
fered substantially in their assignments of membrane helices 
from high-resolution experiments. In fact, for a small subset 
of 13 high-resolution chains, many prediction methods ap- 
peared to be as correct — or as incorrect — as previously de- 
posited low-resolution experiments (Table 1). This problem 
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was also reflected in the substantial differences between the 
numerical scores for some of the methods. For example, 
DAS, TopPred2, and the PHDhtm series used partial infor- 
mation about 9 of the 36 high-resolution chains for devel- 
opment. For these methods, the scores on the 27 cross- 
validated high-resolution chains were similar to those for 
the 36 high-resolution chains (data not shown). However, 
the per-segment scores for the low-resolution sets differed 
from those for the high-resolution sets (Tables 2 and 3, in 
particular Q ok ). There are two possible explanations for this: 
either the low-resolution set contains new motifs, or the 
low-resolution experiments over- or underassign many he- 
lices. Such errors could result in a particularly poor perfor- 
mance in terms of predicting all TM helices correctly. In 
fact, for the set of 1 3 proteins for which we had low- and 
high-resolution experiments, Q ok was low (84%, Table 1) 
for the low-resolution experiments. Furthermore, the obser- 
vation that DAS, TopPred2, and the PHDhtm series got 
higher per-residue scores on the low-resolution data than on 
the high-resolution data indicated that the low-resolution 
assignments might not reflect completely new membrane 
motifs. Thus, the estimate for these cross-validated methods 
may be correctly estimated by the high-resolution data set 
(Table 2). 

Problems with topology assignments by low-resolution 
data. The topologies of two proteins were incorrectly as- 
signed by the low-resolution experiments (Table 1). These 
two proteins were (1) PDB: 1EHK:B/SWISS-PR0T: 
COX2JTHETH; and (2) PDB: 1EUL:A/SWISS-PR0T: 
ATA2_RABIT. (1) 1EHK:B has one membrane helix and 
the N terminus is in the periplasm. Thus, PDB annotates the 
topology IN. In contrast, SWISS-PROT (release 34) anno- 
tates COX2_THETH with topology OUT, despite experi- 
mental data indicating otherwise (Keightley et al. 1995). 
Note that the latest SWISS-PROT release still annotates 
COX2_THETH as OUT. (2) The second pair is more com- 
plicated: The old SWISS-PROT release 20 entry for 
ATCA_RAB1T was annotated with 10 membrane helices 
with topology IN, whereas the PDB structure 1EUL:A has 
10 membrane helices with topology OUT. In contrast, the 
latest SWISS-PROT release for ATA2_RABIT annotates 
10 helices, but still assigns the topology as IN according to 
antibody studies (Moller et al. 1997). However, this experi- 
mentally determined topology may be incorrect because of 
nonspecific antibodies for the N-terminus epitope. Indeed, 
the experimentalists noted that the antibody against the N 
terminus was only immunoreactive to the 1-243 N-terminal 
fragment rather than specific to the N-terminal 12 residues. 
At the same time, they argued that this antiserum can cor- 
rectly locate the epitope for residues 1-12 (Juul et al. 1995). 
They suggested that the N terminus is cytoplasmic, but for 
other cytosolic loops, the authors observed enhanced anti- 
body reactivities. Additionally, the N terminus may be OUT 
because after solubilization with C 12 E 6 , proteolysis did not 



drastically increase reactivity of antiserum 1-12. Further- 
more, antisera to epitopes on all loop regions of 
ATA2JRABIT were not tested. Therefore, it would be use- 
ful to acquire information of the location of the other loops 
in ATA2_RABIT to verify the topological orientation of 
this protein. 

All prediction methods missed only helices with weak 
experimental evidence. None of the helices in the high- 
resolution set and only three in the low-resolution set were 
missed by all advanced methods. As described above (in 
Results), the experiments done for these three proteins were 
not fully convincing in terms of the assignments of trans- 
membrane helices and topology. This observation suggests 
implementing a consensus prediction of membrane helices. 
The potential success of such an approach has been initially 
tried out by a couple of authors (Promponas et al. 1999; 
Ikeda et al. 2001). However, these two initial attempts have 
focused only on advanced methods. Although advanced 
methods are more accurate than simple hydrophobicity- 
based methods, they tend to underpredict transmembrane 
helices, especially for high-resolution structures (Table 2). 
Advanced methods could thus serve as a specificity filter for 
a consensus method. Using both advanced and simple meth- 
ods could help to verify low-resolution experimental results 
from proteolysis and gene fusion. 

Not all membrane proteins identified. The only advanced 
method that predicted all known helical membrane proteins 
to contain at least one helix was DAS (Table 5, false nega- 
tives). However, the flip-side of the same coin was that 
DAS also performed poorly on globular proteins (Table 5, 
false positives). The other extreme was PHDhtm, based on 
conventional pairwise alignments that performed well in 
rejecting globular proteins while also missing almost one- 
fifth of the membrane proteins with the default parameters. 
Obviously, there is a tradeoff between predicting too many 
globular as membrane proteins, and too many membrane as 
globular proteins. Possibly the best compromise was 
achieved by SOSUI and TMHMM, which missed 6% of the 
membrane proteins while incorrectly predicting membrane 
helices in -1% of all globular proteins. PHDhtm based on 
PSI-BLAST profiles (PHDpsihtm) reached a similar com- 
promise: 8% of the membrane proteins were missed, and 
2% of all globular proteins were mispredicted. Neverthe- 
less, the problem of missing membrane proteins underlines 
once again that we need better methods that correctly dis- 
tinguish between globular and membrane proteins. 

Dependence of prediction accuracy on number of helices. 
We did not find any significant difference in the perfor- 
mance between proteins with one and many membrane pro- 
teins. In contrast, proteins with ^5 membrane helices (<5) 
were predicted more accurately than proteins with more 
(>6, Fig. 2B). Although we could label the difference as 
significant, we failed to come up with any reasonable ex- 
planation for this finding. Readers may speculate that the 
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numerical differences we observe between 6TM and 7TM 
proteins could be explained by the overabundance of trans- 
porters with buried charged residues. However, the number 
of proteins in each category was too small to validate such 
a fine-grained distinction. 

Conclusion 

We also overestimated the performance. Although we spent 
considerable effort on comparing prediction methods, our 
comparisons suffered from one crucial problem: We do not 
have cross-validation data available for all methods. In fact, 
the only methods for which we had cross-validated results 
were DAS, PHDhtm, PHDpsihtm, TopPred2, and most of 
the simple methods using only hydrophobicity scales. Al- 
though the overall scores for the advanced methods did not 
differ substantially between the sets of 27 cross-validated 
and 36-non-cross-validated high-resolution chains (data not 
shown), they did differ markedly between the nine chains 
used for development and the 27 cross-validated chains. 
This seemingly contradictory result is explained by the 
simple fact that most high-resolution proteins were not used 
in the development of these methods. In contrast, the newer 
prediction methods PRED-TMR, SOSUI, TMHMM, and 
WW used most and HMMTOP2 used all of the high-reso- 
lution chains for development. In fact, we observed two 
trends: (1) Newer methods were slightly better than older 
ones (HMMTOP2 was clearly more accurate than 
HMMTOP1 when tested on a small subset of the data); and 
(2) methods based on alignments were superior to those 
based on single sequences; in fact, when switching from 
using MaxHom (dynamic programming algorithm for con- 
servation weight-based multiple sequence alignment) align- 
ments against SWISS-PROT as input to PHDhtm to using 
PSI-BLAST alignments against all known sequences 
(BIG— nonidentical merger of SWISS-PROT and TrEMBL 
and PDB — and PHDpsiHtm), prediction accuracy increased 
considerably. 

Most methods get most membrane helices, but the type of 
membrane protein is often wrong. The most common mis- 
take was the under- or overprediction of one transmembrane 
helix. This appears encouraging in terms of prediction meth- 
ods, in general. However, membrane predictions are very 
important in the context of analyzing entire proteomes be- 
cause the number and orientation of the helices typically 
reveal aspects about function. In fact, only the very best 
methods predict all helices and the topology more often 
correctly than not. We may rightfully argue that present 
methods are still not good enough. Because both the number 
of helices and their orientation can easily be altered by 
engineering (Nilsson and von Heijne 1998; Ota et al. 1998; 
Monne et al. 1999a,b), the task at hand is, however, not an 
easy one. These experiments along with our analysis of the 
conservation of transmembrane helices strongly argue 
against the view that the number and orientation of mem- 



brane helices constitute a "solid reality written into the se- 
quence." Rather, single residue exchanges can alter these 
macroscopic features. Thus, correct predictions require a 
precision typically not achieved. Perhaps present methods 
have reached the maximum possible level of accuracy and 
the chapter of simply predicting the location and orientation 
of membrane proteins is closed. With the recent high-reso- 
lution structures challenging common assumptions and our 
present analysis highlighting the number of urgent problems 
with prediction methods, we strongly doubt this. Therefore, 
we challenge that the issues elucidated in this investigation 
have reopened the field rather than closed it. 

Materials and methods 

Data sets 

High-resolution data sets for membrane proteins. We started with 
a total set of 105 chains from helical membrane proteins for which 
a high-resolution structure was deposited in PDB (Berman et al. 
2000). We identified these as helical membrane proteins according 
to the excellent up-to-date collection of membrane proteins at 
http://blanco.biomol.uci.edu (Jayasinghe et al. 2001b). 

Low-resolution data sets for membrane proteins. We used an 
expert-curated set of 165 helical membrane proteins that was col- 
lected by Stefan Moller and colleagues (Moller et al. 2000). For all 
these proteins, good low-resolution experimental evidence about 
localization was available. For the comparison between high-reso- 
lution and low-resolution data, we used the annotations we found 
about transmembrane helix location in old SWISS-PROT versions 
released prior to the publication of the high-resolution structures. 

High-resolution data set for globular proteins. The EVA server 
(Eyrich et al. 2001) continuously maintains a sequence-unique 
subset of PDB proteins. We used the version from July 2001 with 
1852 representative protein chains. From that set we first removed 
all membrane proteins. Then we removed all proteins that were 
similar to one representative in a SCOP superfamily (Murzin et al. 
1995; Lo Conte et al. 2000). Representatives were taken to be the 
longest proteins in the respective superfamily. This procedure 
yielded a final set of 616 globular protein chains. 

Data set of proteins with known signal peptides. Henrik Nielsen 
and colleagues at the CBS in Copenhagen keep an up-to-date list 
of experimentally known signal peptides at their Web site (http:// 
www.cbs.dtu.dk/ftp/signalp/readme). This group also spent con- 
siderable effort at defining thresholds for what constitutes redun- 
dancy in sets of signal peptides (Nielsen et al. 1996, 1997a). We 
downloaded a set of 1418 sequence-unique signal peptides from a 
total list of 2845. 

Sequence-unique subsets reduce bias. Many of the proteins for 
which we have information about TM regions are similar to one 
another. If we want to analyze prediction methods or simple fea- 
tures such as TM length, this bias is problematic. To reduce the 
bias from the set of enzymes of known function, we have to first 
generate all-against-all alignments that capture the bias existing in 
that set. Then, we have to choose the maximal subset that fulfils 
the constraint that no pair in that subset is sequence-similar. Tech- 
nically, we accomplished this objective in the following way. First, 
a pairwise BLAST (Altschul and Gish 1996) aligned all membrane 
proteins against each other. Second, the resulting pairs were fil- 
tered applying the HSSP-threshold (value 6 = 0, below) such that 
all remaining pairs were likely to have similar structures. Third, 
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the resulting families were sorted by number of members and 
length. Fourth, all pairs were clustered with a simple greedy algo- 
rithm starting with the largest and longest families (Hobohm et al. 
1992). Note that the threshold chosen roughly translated to "no 
pair with more than 33% sequence identity over more than 100 
residues aligned." In particular, we used the following formula to 
compile the distance DIST from the HSSP-curve HSSP_PIDE 
(Rost 1999): 



DIST = PIDE - HSSP J>IDE (ft) 
HSSP_PIDE («) = 

f 100 
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where PIDE is the percentage pairwise sequence identity (ignoring 
gaps and insertions). This procedure yielded 36 proteins in the 
high-resolution set, and 165 proteins in the low-resolution set. 

Programs tested 

Building multiple alignments. Two different alignment schemes 
were explored: (1) the dynamic programming method MaxHom 
(Sander and Schneider 1991), and (2) a profile-based PSI-BLAST 
(Altschul et al. 1997). The particular protocol for finding similari- 
ties with PSI-BLAST applied the usual precautions to avoid drift 
and pollution (Jones 1999; Przybylski and Rost 2002). Searches 
were restricted to three iterations, and the iteration parameter (H- 
value) to I0~ ,w was set. The search databases were SWISS-PROT 
(Bairoch and Apwciler 2000) and BIG (SWISS-PROT [Bairoch 
and Apweiler 2000] + TrEMBL [Bairoch and Apweiler 2000] + 
PDB [Berman et al. 2000]). To explore the conservation of mem- 
brane helices, we filtered all MaxHom alignments according to 
various distances 0 (eq. 1 ). 

Advanced prediction methods. We referred to prediction meth- 
ods as advanced when they implement more than simple hydro- 
phobicity scales. We tested the following programs: DAS, 
HMMTOP (version 2), PHDhtm, PHDpsihtm, PRED-TMR, SO- 
SUI, TMHMM (version 2), and TopPred2. TopPred2 averages the 
GES-scale of hydrophobicity (Engelman et al. 1986) using a trap- 
ezoid window (von Heijne 1992; Sipos and von Heijne 1993). 
PHDhtm combines a neural network using evolutionary informa- 
tion with a dynamic programming optimization of the final pre- 
diction (Rost et al. 1995, 1996b). DAS optimizes the use of hy- 
drophobicity plots (Cserz6 et al. 1997). SOSUI (Hirokawa et al. 
1998) uses a combination of hydrophobicity and amphiphilicity 
preferences to predict membrane helices. TMHMM is the most 
advanced, and seemingly most accurate, present method to predict 
membrane helices (Sonnhammer et al. 1998), It embeds a number 
of statistical preferences and rules into a hidden Markov model to 
optimize the prediction of the localization of membrane helices 
and their orientation (note: similar concepts are used for 
HMMTOP; Tusnady and Simon 1998). PRED-TMR uses a stan- 
dard hydrophobicity analysis with emphasis on detecting the ends 
and beginnings of membrane helices (Pasquier et al. 1999). 

Simple methods exclusively based on hydrophobicity scales. We 
also implemented our in-house prediction methods that simply 
used various hydrophobicity scales for prediction. In particular, we 
tested the following scales: A-Cid, normalized hydrophobicity 
scale for a-proteins (Cid et al. 1992); Av-Cid, normalized average 
hydrophobicity scale (Cid et al. 1992); Ben-Tal, Hydrophobicity 
scale representing free energy of transfer of an amino acid from 
water into the center of the hydrocarbon region of a model lipid 
bilayer (Kessel and Ben-Tal 2002); Bull-Breese, Bull-Breese hy- 



drophobicity scale (Bull 1974); Eisenberg, normalized consensus 
hydrophobicity scale (Eisenberg et al. 1984); EM, Solvation free 
energy (Eisenberg and McLachlan 1986); Fauchere, hydrophobic 
parameter tt from the partitioning of /V-acetyl-amino-acid amides 
(Fauchere and Pliska 1983); GES, hydrophobicity property (En- 
gelman et al. 1986; Prabhakaran 1990); Heijne, transfer free en- 
ergy to lipophilic phase (von Heijne and Blomberg 1979); Hopp- 
Woods, Hopp-Woods hydrophilicity value (Hopp and Woods 

1981) ; KD, Kyte-Doolittle hydropathy index (Kyte and Doolittle 

1982) ; Lawson, transfer free energy (Lawson et al. 1984); Levitt, 
hydrophobic parameter (Levitt 1976); Nakashima, normalized 
composition of membrane proteins (Nakashima et al. 1990); Rad- 
zicka, transfer free energy from 1 -octanol to water (Radzicka and 
Wolfenden 1988); Roseman, solvation-corrected side-chain hy- 
dropathy (Roseman 1988); Sweet, optimal matching hydrophobic- 
ity (Sweet and Eisenberg 1983); Wolfenden, hydration potential 
(Wolfenden et al. 1981); and WW, Wimley-White scale (Jayas- 
inghe et al. 2001a). Replacing the WW scale with each of the 
above-mentioned hydrophobicity indices, we used the WW algo- 
rithm to evaluate the predictive performance of each index. 

Measuring accuracy 

Measuring per-segment accuracy. The ultimate goal of prediction 
methods obviously is to correctly predict all residues. Assume a 
protein with 10 membrane helices of 20 residues each; method A 
predicts 10 helices but gets the five residues at each end of each 
helix wrong, and method B misses four helices but gets the ends 
for the other six entirely right. Which method is better? Possibly, 
many readers would favor method A. This problem is captured in 
using two different scores measuring prediction accuracy in the 
field of globular secondary structure prediction: per-residue scores 
and per-segment scores (Rost and Sander 1993; Rost et al. 1994). 
Although globular secondary-structure segments are, on average, 
rather short (helices -10 residues, strands ~5 residues), membrane 
helices are rather long. Consequently, the problem of evaluating 
the per-segment accuracy allows a more coarse-grained measure 
than required for globular secondary-structure prediction (Rost et 
al. 1994; Zemla et al. 1999). There are two separate issues to 
address when defining a helix to be predicted correctly. The first 
concerns counting the same helix twice. We used the simple con- 
cept of "correctly predicted segment" shown in Figure 4. 

In particular, the observed helix 02 is not correctly predicted, 
because PI overlaps already with 01. Similarly, P2 is counted as 
correct with respect to 03, whereas P3 is not. The second issue 
concerns the minimal overlap required between the observed and 
predicted helix. If not stated otherwise, we required a minimal 
overlap of 3 residues, following the definitions previously used in 
many other publications (von Heijne 1992; Jones et al. 1994; Pers- 
son and Argos 1994; von Heijne 1994; Rost et al. 1995, 1996b; 



lobserved TM 01 




I predicted TM^PI \ I 



mi 


mm 


03 X 1 


[pjT 







Fig. 4. Correctly predicted segments. In this example, there are three ob- 
served and three predicted helices. Observed helix 01 is correctly predicted 
by PI as they overlap. However, observed helix 02 is not correctly pre- 
dicted because PI already overlaps with 01. Hence, Pi cannot be used as 
a correct prediction for 02. Similarly, P2 is counted as correct only with 
respect to 03, whereas P3 is not since 03 was already predicted by P2, 
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Persson and Argos 1996; Sonnhammer et al. 1998). Moller et al. 
(2001) used a similar procedure; however, they required an overlap 
of at least 9 rather than 3 residues. Other groups required a mini- 
mal overlap of 1 residue (e.g., Cserzo et al. 1997; Tusnady and 
Simon 1998). Jayasinghe required an overlap of 9 (Jayasinghe et 
al. 2001b) and 3 (Jayasinghe et al. 2001a) residues; however, in 
both publications, they counted the same predicted helix twice, 
thus yielding 100% accuracy for the overlap between 01 /PI and 
02/P2 in Figure 4, Yet another measure was introduced by Ikeda 
et al. (2001): Helices were considered as correctly predicted if 
the centers of the predicted and the observed helix overlapped 
by at least 1 1 residues. The different measures are illustrated in 
the following example for a prediction (T = transmembrane): 

observed: 

predict 1 : TTTTTTTTTT 

predict 2 : TTTTTTTTTTTTTTTT 

predict 3 : TTTTTTTTTTTTTTTTTTTT 

predict 4 : TTTTTTTTTTTTTTTTTTTTT 

Jayasinghe et al. (2001a) evaluates prediction 1 as 0% accurate 
and 2-4 as 100% accurate (two helices correct); Jayasinghe et al. 
(2001b) give predictions 1 and 2 0% and 3 and 4 100%; Tusnady 
and Simon (1998) give 1-4 50% (one helix right, one not); Moller 
et al. (2001) give 1-2 0% and 3-4 50%; Ikeda et al. (2001) give 
1-3 0% and 4 50%; the score that we refer to in this manuscript 
gives 1 0% and 2-4 50%. For comparison, we also provided a few 
other scores in the Supplementary Material (available online at 
http://www.proteinscicnce.org; note that we, however, did not 
count helices twice in any of those definitions). 

With this concept, we can compile the percentage of correctly 
predicted transmembrane helices: 

%ohs number of correctly predicted TM in data set 

Shtm = 100 • 



number of TM observed in data set 



(2) 



where G htm %obs estimates the likelihood that an actual membrane 
helix is correctly predicted. Although this score can also be com- 
piled for a single protein, it would be misleading to compile the 
score for each protein in a data set and then to average over all 
proteins. Rather, the number should be compiled by pooling all 
membrane helices from an entire data set. Overpredictions are 
measured by the corresponding score: 



e hlm %prd =ioo- 



number of correctly predicted TM in data set 
number of TM predicted in data set 



(3) 

where Q htm %prd estimates the likelihood that a predicted TM is 
correctly predicted. These two scores are merged into a score that 
describes for which percentage of the proteins all TM segments are 
correctly predicted: 



<2<* = 



100 



e hIn ,* obs Ae hl 



%pnd _ 



= 100 



(4) 



Thus, Q ok becomes 100 if and only if for all proteins in the set both 
Chtm %obs and G hlm %prd reach 100%. Finally, we need to evaluate 
the accuracy of predicting the topology correctly: 



TOPO= 100 



number of proteins with 
correctly predicted topology 

number of proteins 



(5) 



Measuring per-residue accuracy. Although the per-segment scores 
capture most of what experts would intuitively consider as impor- 
tant features of TMH prediction methods, we also need to monitor 
a number of per-residue scores that evaluate how accurately par- 
ticular residues are predicted. In particular, the example of P2 and 
P3 in Figure 4 would yield 0 for all per-segment scores, although 
the predictions somehow capture important information. The sim- 
plest per-residue score is the two-state per-residue accuracy Q 2 , 
which measures the percentage of residues predicted correctly in 
either of the two states T (membrane helix) or N (not membrane): 



Q 2 = 



number of residues predicted 
100 correctly in protein i 

Nprot number of residues in protein i 



(6) 



Typically, most residues in membrane proteins are in globular 
regions (Liu and Rost 2001). Thus, nonmembrane residues tend to 
dominate Q 2 . This problem can be overcome by simply measuring 
the percentage of residues correctly predicted in membrane seg- 
ments: 



'=100 



number of residues correctly 
predicted in TM helices 

number of residues observed 
in TM helices 



(7) 



Similar to the per-segment scores, overpredictions can be captured 
by the corresponding score: 



%prd _ 



= 100 



number of residues correctly 
predicted in TM helices 

number of residues predicted 
in TM helices 



(8) 



C?2N %obs and C?2N %prd are the corresponding percentages for non- 
membrane residues. Finally, we monitored the Matthews correla- 
tion index (Matthews 1975) that attempts to capture both over- and 
underprediction of residues in transmembrane helices by one 
single score. This index is defined as: 



Pj * fly — Uy * Oy 



V(/>t + "t) • (Pt + °t) • ("t + "t) * (n T + o T ) 



(9) 



where p r is the number of residues correctly predicted as mem- 
brane helix (TMH), n T is the number of residues correctly pre- 
dicted as non-TMH, and u T and o r are the number of residues 
under- and overpredicted, respectively. 

Estimating error for per-residue accuracy: standard error. For 
globular proteins, prediction accuracy varies considerably between 
different proteins (Rost et al. 1993; Rost 1996). The corresponding 
distributions can be approximated by Gaussian distributions. Thus, 
we can estimate the standard error of score Q by the simple rule- 
of-thumb: 



SE (0= 



V {g^prot-largj 



(10) 



-sel 

where ct is the standard deviation for score Q based on a data set 
of N proMargc proteins. This set has to be sufficiently large to actu- 
ally observe a normal distribution. Assuming that we only have a 
much smaller data set of /V pro ,_ sct proteins, we can then still ap- 
proximate the standard error by using the standard deviation com- 
piled over the large data set. Whereas this concept is easy to apply 
to evaluations of globular prediction methods (Eyrich et al. 2001; 
Rost and Eyrich 2001), for the situation of membrane proteins, we 
simply do not have a sufficient number of high-resolution struc- 
tures to once and for all estimate a. There is no clean solution to 
this problem. Here, we used the following approximation: 
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SE(0- \/y/N. 



prolX 



max 



I all methods for set X; 
loll sets Y with /VprotY 



AprotX 



(ID 



that is, we used the maximal possible standard error. Assume that 
a = 20 for a set of 13 proteins, <r = 10 for a set of 36 proteins, 
and a = 15 for a set of 27 proteins. Then we used or = 20 for the 
first, and a = 15 for the other two. 

Estimating error for perse gment accuracy: bootstrap experi- 
ment. The above concept to estimate the error in evaluating per- 
formance is not applicable for the per-segment scores, because 
these are not distributed normally. To illustrate the problem for the 
topology prediction; scores can be 1 (correct topology) or 0 (in- 
correct) for one protein. The score TOPO (eq. 5) averages over all 
proteins, hence provides one single final value, rather than a dis- 
tribution. One way to still estimate the error in such a situation is 
the bootstrap experiment (Diaconis and Efron 1983; Efron et al. 
1996). The procedure is the following (Fig. 5); (1) Assume we 
have a set of /V = 36 proteins, each with correct or incorrect 
topology. (2) Choose a random subset of K < N proteins, and com- 
pile the average (TOPO) over these K proteins. (3) Repeat M times 
and estimate the error based on the resulting distribution of aver- 
ages. In other words, the bootstrapping experiment attempts to 
estimate how sensitively a score depends on a particular data set 
chosen. Albeit often surprisingly powerful, bootstrapping is a more 
coarse-grained approximation. In particular, we used the following 
parameters to estimate errors for per-segment scores: M = 100 
(100 random picks), and K = int(AV2); that is, for each random 
pick we chose half of the proteins available in the respective sets. 
Finally, we applied the same approximation as depicted in equa- 
tion 11, that is, reported a rather conservative estimate for the 
error. 

Ranking methods. Given methods A and B evaluated on a set 
with N proteins, when can we conclude that the performance of A 
(g(A)) is significantly better than that of B (2(B))? The error 
estimates provide an answer to this question: We cannot distin- 
guish between A and B if: 



AG = 2(A)- 2(B) ^SE(2) 



( 12) 



Thus, we can rank only if A and B differ by more than the error. 
For example, when a method correctly predicts 75% of the resi- 



Given: set with K samples 
Choose parameters: 

• number of samples in subset K < N 

• number of random picks M 

DO: 



f or each random pick m (m = 1 M) 



1 : choose subset of K samples 
2: compile average over Q on K = Q(m) 



compile average and standard 
deviation over all Q(m) 



Fig. 5. Procedure for estimating error using a bootstrap experiment. Given 
a data set with N items, one first defines K. which is the number of items 
one will select from the original data set, and M, which is the number of 
times one will choose a sample of size K. For instance, if the data set is of 
size 36, then one defines K < 36. Once K and M are defined, one selects a 
sample of size K and calculates the average value for the appropriate 
metric. Repeating this process M times will yield M average values. One 
can then compile the averaged value and standard deviation for these M 
average values. 



dues in a test set of 16 proteins with a standard deviation of 10%, 
a difference relative to another method that is smaller than 2.5% 
(i.e., A2 = 10/sqrt[16]) is not significant. Thus, we cannot distin- 
guish between two methods that predict correctly 75% and 73% of 
all residues, respectively. We used this estimate to rank methods in 
the following way. Assume four methods have accuracy levels of 
A = 75, B = 73, C = 71, and D = 68. D can be distinguished 
from all other methods (Ag > 2.5 to all). Hence, it ranks last. C can 
be distinguished from A (A2 — 4 > 2.5). However, A cannot be 
distinguished from B (A2 = 2 < 2.5), and B cannot be distin- 
guished from C (A2 = 2 < 2.5). This situation results in a di- 
lemma that has four different possible solutions: (I) A, B, and C 
get the same rank, ascertaining that no two methods are ranked 
differently that cannot be distinguished. (II) A and B get rank 1, 
and C rank 2, ensuring that no two methods are ranked equally that 
can be distinguished. (Ill) A gets rank 1, B rank 2, and C rank 3, 
ignoring that we cannot distinguish between A and B, nor between 
B and C. (IV) Do not rank. None of these solutions is correct. 
Here, we applied solutions (IV) and (I). For the example given, 
solution (I) implied that A, B, and C ranked first; D ranked second. 
However, this simplification ignored another intrinsically insur- 
mountable problem: What if method A is significantly better than 
method B in terms of Q 2 and significantly worse in terms of Q ak ? 
Occasionally, the following ad hoc solution is presented to such a 
problem: Rank all methods on all scores and compile averages 
over ranks (Tables 3 and 5). 

Electronic supplemental material 

All data sets and a few additional results are available through 
our Web site at: http://cubic.bioc.columbia.edu/papers/2002_htm_ 
eval/data. 
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Synthetic Peptides for Production of 
Antibodies that Recognize Intact Proteins 



UNIT 11.16 



Antibodies that recognize intact proteins can be produced through the use of synthetic 
peptides based on short stretches of the protein sequence, without first having to isolate 
the protein. The procedure for selecting stretches of protein sequence likely to be antigenic 
is relatively straightforward. However, no procedure will identify a single sequence 
guaranteed to be effective, nor will it usually identify the best single sequence to use. 
Rather, several sequences will be identified that have a higher-than-average probability 
of producing an effective antigen. 

The steps to produce an effective antibody include: (1) designing the peptide sequence 
based on the sequence of the protein; (2) synthesizing the peptide; (3) preparing the 
immunogen either by coupling the synthetic peptide to a carrier protein or through the 
use of a multiple antigenic peptide (MAP); (4) immunizing the host animal; (5) assaying 
antibody titer in the host animal's serum; and (6) obtaining the antiserum and/or isolating 
the antibody. This unit covers steps 1 and 3; step 2 requires a laboratory with expertise in 
peptide synthesis. Peptide synthesis services are widely available both academically and 
commercially. 

The best method to select potentially effective sequences is via a computer-assisted 
strategy (see Basic Protocol 1). An alternative manual method is also described (see 
Alternate Protocol 1) but is not recommended to replace the use of algorithms if there is 
a choice. A small synthetic peptide is usually insufficiently immunogenic on its own, and 
two methods have been developed to solve this problem. The first (see Basic Protocol 2) 
involves chemically coupling the synthetic peptide to a carrier protein to boost the immune 
response. The second method (see Alternate Protocol 2) entails direct synthesis of a MAP 
covalent multimer of the simple peptide sequence. Both methods have proven effective 
and it is a matter of personal preference which to use. Coupling to a carrier protein requires 
additional chemical manipulations after synthesis of the peptide, while the MAP is 
complete and ready for immunization at the conclusion of the synthetic protocol. 
Disadvantages of MAPs are that they are more difficult to produce homogeneously and 
to analyze postsynthetically. They also may be more prone to insolubility problems. 

A carrier protein is a relatively large molecule capable of stimulating an immune response 
independently. A synthetic peptide coupled to a carrier protein acts as a hapten and 
produces antibodies specific for the hapten (antibodies against the carrier protein are also 
produced). The most commonly used carrier proteins are keyhole limpet hemocyanin 
(KLH) and bovine or rabbit serum albumin (BSA or RSA). KLH is usually preferred, 
because it tends to elicit a stronger immune response and is evolutionarily more remote 
from mammalian proteins. A common problem with KLH, however, has been its solubil- 
ity. Pierce Chemical Company sells a preparation of KLH purported to have better 
solubility properties (see below). 

Alternatively, peptides can be coupled to carrier proteins through either their amino (see 
Alternate Protocol 3) or carboxyl groups (see Alternate Protocol 4). These two alternate 
protocols are not recommended as a first choice for coupling, but are included because 
they have been used successfully and may be advantageous for certain special applications 
discussed in the Commentary. Also presented are methods for assaying free sulfhydryl 
content and for reducing disulfide bonds in synthetic peptides (see Support Protocols 1 
and 2). 
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Once the coupling procedure has been performed, it is possible to determine the approxi- 
mate degree of coupling by amino acid analysis (see Support Protocol 3). However, in 
most instances this is unnecessary and the product can be used directly. 



BASIC COMPUTER- ASSISTED SELECTION OF APPROPRIATE ANTIGENIC 
PROTOCOL 1 PEPTIDE SEQUENCES 

An antibody produced in response to a simple linear peptide will most likely recognize a 
linear epitope in a protein. Furthermore, that epitope must be solvent-exposed to be 
accessible to the antibody. The general features of protein structure that correspond to 
these criteria are turns or loop structures, which are generally found on the protein surface 
connecting other elements of secondary structure, and areas of high hydrophilicity, 
especially those containing charged residues. As a consequence, computer algorithms that 
predict protein hydrophilicity and tendency to form turns are very useful. Several analytic 
programs or algorithms that attempt to do this have been developed. Although the choice 
of method may rely on availability or personal preference, there tends to be a high level 
of agreement among them. As stated earlier, none of the methods will identify the one 
single sequence guaranteed to produce an effective antibody against any given protein. 
Rather, the methods will offer several good candidates, one or several of which can be 
used. 



Many of these algorithms may already be available on a local computer system. They are 
included in many commercial software packages such as GCG (Genetics Computer 
Group; see appendix 4). The ExPASy Web site of the University of Geneva offers free 
access to a variety of different programs over the Internet at http://expasy.org/tools. 

The following protocol utilizes the hydropathy index developed by Kyte and Doolittle 
(1982) and the secondary structure prediction method for P turns developed by Chou and 
Fasman (1974) found in the tool "Protscale" at the ExPASy Internet address. 

1. Using the selected algorithms, compute the hydropathy index and the tendency for 
(i-turns of the protein sequence. Use a window size of 7 or 9 and give equal weight 
to each amino acid. Record the results in either graphical or numerical form, or both. 

As an example, the graphical representation of these results for the protein sequence shown 
in Figure 1L16.1 is presented in Figure J 1.16.2. 

A window size determines the number of amino acids to be used in computing a value for 
the amino acid at the center of the window. For example, a window size of 9 includes 4 
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10 

MAKVSLEKDK 
SRTHLTEDVI 
LLLLRGVPEA 
DIENKLPLGN 
GTWDIPALC 
QENIGLEVAG 
QGVNIAAQYL 



20 

IKFLLVEGVH 
NAAEKLVAIG 
NAKAHRGVWN 
ATQVQHLSDL 
DALASKHLAG 
KLIKYSDNGS 
QTSAQMGYW 



30 

QKALESLRAA 
CFCIGTNQVD 
KLAAGSFEAR 
LNMSDWSLH. 
AAIDVFPTEP 



40 50 60 

GYTNIEFHKG ALDDEQLKES IRDAHFIGLR 
LDAAAKRGIP VFNAPFSNTR SVAELVIGEL 120 
GKKLGIIGYG HIGTQLGILA ESLGMYVYFY 180 
VPENPSTKNM MGAKEISLMK PGSLLINASR 240 



TLSAVNFPEV 
IDIEADEDVA 



ATNSDPFTSP LCEFDNVLLT PHIGGSTQEA 300 
SLPLHGGRRL MHIHENRPGV LTALNKIFAE 360 
EKALQAMKAI PGTIRARLLY 



Figure 11.16.1 The amino acid sequence of a 410-residue protein analyzed by the method 
presented in Basic Protocol 1 . The results are shown in Figure 1 1 .16.2. 
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amino acids on each side of the central amino acid. The value computed for the central 
amino acid is the simple average of the values for each amino acid in the window. 

2. Compare the results of the two analyses and look for areas of sequence that are high 
in turn tendency and high in hydrophilicity (low in hydrophobicity). 

In Figure 11.16.2, these areas correspond to positive peaks in the Chou-Fasman analysis 
and negative peaks in the Kyte-Doolittle analysis. The three best areas in terms of amplitude 
and correlation are shaded. These correspond to the sequences underlined in Figure 11.16. 1. 
(Note the alignment of these peak optima as compared to the peaks around residue 300.) 




50 100 150 200 250 300 350 400 



Figure 11.1 6.2 Graphical representation of the results generated by a computer algorithm for the 
sequence in Figure 11.16.1, analyzed by the method presented in Basic Protocol 1. The shaded 
areas represent three regions in the sequence meeting criteria for selection as potential immuno- 
gens. (A) Analysis for p turns (Chou and Fasman, 1974). (B) Analysis for hydrophobicity (Kyte and 
Doolittle, 1982). 
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3. Examine the sequences for glycosylation site motifs and discard any sequences that 
contain them unless it is known that the protein is not glycosylated. 

Amino acids in glycosylated regions may be shielded from presentation to an antibody by 
masking carbohydrates, 

Amino-linked carbohydrate chains can occur at Asn-X-Ser or Asn-X-Thr sequences. 
Hydroxyl-linked carbohydrate chains do not appear to have a set motif. A program to assist 
in the prediction of mucin-type GalNAc O -glycosylation sites in mammalian lipoproteins 
is found in the tool "NetOGlc " at the Expasy site (http://expasy.org/tools). However, before 
using read the documentation carefully and keep in mind that such prediction methods 
cannot always be successful. 

4. Select the best sequences resulting from this analysis to use as antigenic peptides. 
These are sequences where the largest positive values (peaks with positive deflection) 
for turn propensity correspond in position to the largest negative values (peaks with 
negative deflection) for hydrophobicity. The values obtained in these analyses are 
relative and dependent on the individual protein's composition, so it is not possible 
to set an arbitrary minimum value as a cutoff for rejecting a particular peak. Rather, 
always select the peaks of greatest magnitude in any given sequence. In addition, the 
immediate amino-terminal and carboxyl-terminal regions of proteins are often ex- 
posed to solvent. If these areas appear to be hydrophilic in nature, they are also 
acceptable candidates. Thus each analysis may provide several potential sequences. 
How many peptides to make (see Anticipated Results) is a matter of individual choice. 
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MANUAL INSPECTION TO SELECT APPROPRIATE PEPTIDE SEQUENCES 

If computer algorithms are not available, it is possible to select potential sequences by 
manual inspection. Although there is no evidence that a manual method is any less 
effective than the use of computer algorithms, there is a greater probability of overlooking 
potentially important areas of sequence. It is therefore recommended that computer 
analysis be used whenever it is available. Although it can be done, it would be very time 
consuming and labor intensive to manually calculate values for every overlapping peptide 
offset by a single amino acid in the same way that the algorithms do. For this reason, areas 
rich in polar residues are selected for manual calculation of hydrophilicity and turn 
propensity. 

1 . Visually inspect the protein sequence and select areas that contain at least two to three 
charged residues (Lys, Arg, His, Asp, Glu) within a 10- to 15-residue span. 

If this criterion cannot be met, select sequences with the greatest number of charged 
residues. 

2. From the sequences identified in step 1 , select a subset of sequences that are the 
highest in Ser, Thr, Asn, Gin, Pro, and Tyr content. 

3. Calculate average hydrophilicity and turn propensity for each amino acid in the 
selected sequences using the values given in Table 1 1 . 16. 1 and a window of 9 residues 
(see Basic Protocol 1, step 1). 

Be sure to include the residues flanking the selected sequence for calculation of values for 
- the residues at the ends of the selected sequence. In other words, do not use different size 
windows. 

4. Plot the values for each amino acid of a chosen sequence. 

Sequences whose optimal values for hydrophilicity and turn propensity correspond (as in 
Fig. I 1. 16.2) are considered good candidates. 

5. Inspect sequences for glycosylation motifs and discard these candidates (see Basic 
Protocol l,step 3). 
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Table 11.16.1 Hydrophobic and p-Turn Indices of Amino 
Acids 

Amino acid Symbols H y dro P ho a bicit y Mim 

J value" propensity 6 
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Gly (G) 


-0.4 


1.56 


Alanine 


Ala (A) 


1.8 


0.66 


Methionine 


Met (M) 


1.9 


0.60 


Cysteine 


Cys(C) 


2.5 


1.19 


Phenylalanine 


Phe (F) 


2.8 


0.60 


Leucine 


Leu (L) 


3.8 


0.59 


Valine 


Val (V) 


4.2 


0.50 


Isoleucine 


He (I) 


4.5 


0.47 



fl Kyte and Doolittle (1982). 
*Chou and Fasman (1974). 



Amino acids of glycosylated regions may be masked in native proteins, so an antibody 
raised against them would be ineffective. 

6. Select the best sequences (see Basic Protocol 1, step 4 for criteria), choosing a high 
turn-propensity-to-hydrophobicity ratio. 



DESIGNING A SYNTHETIC PEPTIDE FOR COUPLING TO 
A CARRIER PROTEIN 

Although there is no direct evidence to show that the state of the termini of the peptide 
affects its ability to produce antibodies that will react with the protein, most procedures 
suggest that the termini of the peptide should mimic their native state. Thus, sequences 
whose terminal residues normally are in peptide linkage in the protein can have their 
amino-terminal and carboxyl-terminal groups modified by acetylation and amidation, 
respectively, during synthesis. 

Modification of the amino or carboxyl termini will decrease the polarity of the peptide in 
solution and could have a significant effect on the peptide's solubility. If the peptide lacks 
sufficient protonatable side chains, modification of the termini can be omitted. A general 
rule to predict solubility is that the total number of charges at a given pH should be at 
least 20% of the number of residues in the peptide. 

1. Choose a sequence of 10 to 15 amino acid residues for the synthetic peptide. 

Longer peptides are more difficult and expensive to make, and they are usually unnecessary. 
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REAGENTS AND SOLUTIONS 



Use Milli~Q-purified water or equivalent for the preparation of all buffers. For common stock solutions, 
see APPENDIX 2; for suppliers, see APPENDIX 4. 

Cysteine standard stock solution 

Dissolve 26.3 mg cysteine hydrochloride monohydrate in 100 ml of 0.1 M sodium 
phosphate, pH 8.0 (appendix 2). Prepare immediately before use. 

Ellman's reagent solution 

Dissolve 4 mg Ellman's reagent, 5,5'-dithio-bis-(2-nitrobenzoic acid) (Pierce), in 1 
ml of 0. 1 M sodium phosphate, pH 8.0 (appendix 2). Prepare immediately before use. 

Glutaraldehyde solution, 0.15% 

Add 30 \il of 25% aqueous glutaraldehyde solution to 5 ml of 50 mM sodium borate 
buffer, pH 8.0 (pH adjusted with HC1). Prepare fresh and use immediately. If the 
glutaraldehyde precipitates, check the pH. It should not be above 8.0; a slightly lower 
pH can be used (pH 7 to 8). 

CAUTION: Glutaraldehyde is a sensitizing agent that should be handled in a hood and only 
according to the recommendations in the Material Safety Data Sheet. When mixing solutions 
or performing reactions, keep the container covered to prevent vapors from escaping into 
the atmosphere. 

GuanidineHCl, 6 M 

Dissolve 1 g guanidine-HCl in 1 ml of 0.05 M sodium phosphate, pH 7.0 (appendix 
2). Store up to several weeks at room temperature. 

The resulting 1.8-ml solution should be -0.025 M phosphate/6 M guanidine-HCl at pH 7.0. 
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COMMENTARY 

Background Information 

Synthetic peptides are linear arrays of amino 
acids that in most instances possess a random 
structure in solution. While it is not difficult to 
produce antipeptide antibodies, it does not nec- 
essarily follow that the antibodies will recog- 
nize a protein containing the same stretch of 
sequence found in the peptide. In order for this 
to occur, the amino acids in the protein must be 
oriented to the antibody in a way similar to that 
of the synthetic peptide. This generally requires 
three basic features of the protein: (1) that the 
stretch of sequence be exposed to solvent; (2) 
that the sequence be a continuous stretch of 
amino acids; and (3) that it not possess a higher- 
order structure that renders it unrecognizable 
by the antibody population. 

The large number of model protein struc- 
tures now available indicate that almost all of 
the ionized groups in water-soluble proteins are 
on the protein surface. Asp, Glu, Lys, and Arg 
residues, on the average, comprise 27% of the 
protein surface and only ~4% of the protein 
interior. The fraction of residues that are at least 
95% buried range from 0.36 to 0.60 for nonpo- 
lar residues and 0.01 to 0.23 for polar residues. 
Only 1% of Arg and 3% of Lys residues fall 



into the 95% buried range (Creighton, 1993). 
Therefore, it is reasonable to expect solvent-ex- 
posed areas of proteins to display relatively 
high levels of polar and charged residues, par- 
ticularly Arg and Lys. 

Proteins display three kinds of secondary 
structure: a-helices, P-sheets, and turns or 
loops. Turns or loops generally connect ele- 
ments of a-helices and p-sheets, and can either 
fit one of several rather strict motifs with rec- 
ognizable hydrogen bonding patterns or be of 
a more extended, random nature. These turn or 
loop structures appear to be most useful for 
antibody production because they tend to be 
found on the surface of proteins connecting 
larger arrays of helices and sheets, and they 
consist of continuous stretches of amino acids. 
Although many amino acid residues in helices 
and sheets are also exposed at the surface, the 
regular geometry of amino acids contained 
within them makes them less suitable for this 
purpose. For instance, in P-sheet structures the 
side chain of each successive amino acid in the 
p-sheet strand points in the opposite direction 
to the ones immediately preceding and follow- 
ing it. Thus, even if the amino acid side chains 
are not predominantly buried in the interior of 
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the protein, only every other side chain is ex- 
posed on the same surface of the sheet. This can 
hinder recognition by an antibody produced 
with a linear peptide capable of assuming a 
more random structure. A similar situation ex- 
ists for a-helices. Although the change in di- 
rection of the side chains of successive amino 
acids is perhaps not as abrupt as in p-sheets, 
only approximately every third or fourth side 
chain is found on the same surface of the helix. 
Epitopes in proteins have been identified in 
amphipathic helices, but unless the synthetic 
peptide assumes a similar helical structure in 
solution, recognition by the antibody may be 
problematic. 

These considerations have led to more use- 
ful methods for predicting sequences that will 
produce antibodies recognizing intact proteins. 
A variety of different indices that predict hy- 
drophilicity or hydrophobicity and secondary 
structure are available. In addition, predictive 
methods based on segmental mobility, side 
chain accessibility, and sequence variability 
(see Van Regenmortel et al., 1988) have also 
been proposed. All of these methods generally 
tend to yield similar results, but it must be noted 
that these procedures were developed for (and 
work best with) water-soluble proteins com- 
posed of a single globular structure. Additional 
complications can arise with multisubunit pro- 
teins, where normally exposed structures may 
be shielded by subunit interactions, or mem- 
brane proteins with large sections shielded 
from the solvent. 

The method presented in this unit utilizes 
the correlation between the hydrophilic char- 
acter of a peptide sequence (Kyte and Doolittle, 
1982) and its propensity to form p-turn struc- 
tures (Chou and Fasman, 1974). Free access to 
these and many other algorithms is provided at 
the ExPASy Web site of the University of Geneva 
at http://expasy.org.Wols. 

After selection of the peptide sequence, an 
effective immunogen is generally produced by 
coupling the peptide to a carrier protein or by 
synthesizing a multiple antigenic peptide 
(MAP), with four or eight identical peptides 
assembled simultaneously on the a and e 
amines of the terminal lysines of a branched 
core (see Fig. 11.16.3). 

Critical Parameters 

Analyzing protein sequences with algo- 
rithms or tables of assigned values for amino 
acids is a well-established procedure, but evalu- 
ating these results and selecting the candidate 



sequences requires some consideration. To take 
full advantage of the results, choose areas of 
sequence that give the maximum values for the 
properties being evaluated and that also show 
the highest degree of residue-by-residue corre- 
lation. In other words, choose areas of maxi- 
mum amplitude where the centers of the peaks 
correspond to the same sequence with a diver- 
gence of no more than two to three residues. 
Examples of this are given in Figure 1 1.16.2, 
which shows results from the method presented 
in Basic Protocol 1 for the sequence shown in 
Figure 11.16.1. The top panel in Figure 1 1 . 16.2 
predicts P-turns as calculated by the method of 
Chou and Fasman (1974). The bottom panel is 
a prediction of hydrophobicity using the pa- 
rameters of Kyte and Doolittle (1982). The data 
are analyzed by looking for areas of high turn 
propensity (maximum positive deflection in the 
top panel) and high hydrophilicity (maximum 
negative deflection in the bottom panel). The 
shaded areas in Figure 1 1.16.2 designate three 
segments that meet these criteria. Note that the 
maximum and minimum values of these three 
stretches of protein sequence correlate very 
well. Additional areas of high hydrophilicity 
(bottom panel) are found near residues 64, 1 32, 
137, 149, and 345, although the P-turn values 
of these secondary candidates are not as high 
as those of the three shaded areas. Two equally 
hydrophilic areas at residues 49 and 299 corre- 
spond to downward deflections in the P-turn 
profile and are thus not good candidates based 
on this analysis. 

Many different chemistries are available for 
coupling synthetic peptides to carrier proteins 
to produce effective immunogens (Van Regen- 
mortel et al., 1988). In many cases, however, 
side reactions or incompatibilities in chemistry 
between the coupling agent and the residues 
present in the peptide can be problematic. In 
order to simplify the process and present the 
greatest probability of success in most cases, 
only a few coupling methods are presented in 
this unit. In this regard, the recommended cou- 
pling procedure is cross-linking of the peptide 
via cysteine residues to keyhole limpet hemo- 
cyanin (KLH) with the heterobifunctional re- 
agent m-maleimidobenzoyl-N-hydroxysuccin- 
imide ester (MBS; see Basic Protocol 3). This 
effective method has enjoyed great success and 
can be used for virtually any peptide. The one 
caveat is that it is not recommended for peptides 
with internal cysteine residues, since they will 
also link to the carrier. It is also critical, when 
coupling with MBS through an added terminal 
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cysteine residue, that the sulfhydryl group of 
the peptide be present in the free or reduced 
form (see Support Protocols 1 and 2). 

In addition to MBS coupling, other proce- 
dures commonly used (see Alternate Protocols 
3 and 4) are included as alternatives for use in 
special situations, but these are not recom- 
mended as a general alternative to MBS be- 
cause they are more restrictive and have the 
potential for undesirable side reactions. Glu- 
taraldehyde coupling (see Alternate Protocol 3) 
should not be used with peptides containing 
internal Lys, Cys, iyr, or His residues and, since 
it is a homobifunctional reagent, cross-linking 
of the peptide to itself and the carrier to itself 
can occur. The latter lowers antigenicity and 
can result in extensive aggregation and precipi- 
tation of the carrier. l-ethyl-3(3-dimethylami- 
nopropyl) carbodiimide (EDC; see Alternate 
Protocol 4) is a water-soluble carbodiimide and 
should not be used with peptides containing 
internal Lys, Glu, Asp, iyr, or Cys residues. 
Alternate Protocol 5 describes a simple photo- 
chemical coupling strategy (Gorkaetal., 1989). 

Another good alternative for most peptides is 
the production of a multiple antigenic peptide 
(MAP; see Alternate Protocol 2). With this 
method the composition of the peptide is not a 
concern beyond its potential solubility properties. 
In most cases, since hydrophilic sequences are 
selected, this also is not a major problem. Both 
four- and eight-branched MAPs have been found 
to be effective. However, four-branched MAPs 
are recommended because they are less prone to 
synthesis problems and are easier to characterize. 

As with any synthetic peptide, the product 
must be well characterized before use. If the 
peptide is not what it was intended to be, this 
decreases the probability of generating anti- 
bodies that will recognize the protein. At the 
very least, check synthetic peptides for homo- 
geneity by analytical HPLC and correct mass 
by mass spectrometry (see units 10.21 & 10.22). 
Characterization of MAP can be more problem- 
atic due to their multibranched nature (Mints et 
al., 1997): HPLC and mass spectrometric analy- 
sis can be compromised by the presence of four 
to eight peptide chains per molecule, each of 
which may have only a small percentage of modi- 
fication at any particular residue but which in the 
aggregate contribute to broad spectra. However, 
this feature of MAPs usually does not tend to 
compromise their ability to form antigens of the 
proper peptide since the correct sequence is usu- 
ally present in high enough concentration that a 
significant amount of specific antibody is pro- 



duced among the polyclonal population. 
Amino acid analysis (unitio.ib), which is less 
sensitive to multiple small differences, tends to 
give a reasonable assessment of the MAP integ- 
rity. 

Anticipated Results 

The methods outlined in this unit produce 
an effective polyclonal antiserum against an 
intact protein from a single peptide sequence 
-50% to 70% of the time. Therefore, it is 
advisable to prepare two or three different pep- 
tides from a given protein to increase the prob- 
ability of at least one of them being effective. 

Time Considerations 

Computer-assisted analysis of a protein se- 
quence and inspection of the data to select 
several candidate sequences takes from 5 to 30 
min. Manual analysis of a protein sequence can 
take several hours but can certainly be accom- 
plished in < 1 day. Selection of peptide design 
and manner of synthesis as well as selection of 
a coupling method will take < 1 hr. Actual prepa- 
ration of the peptide can be accomplished in 3 
to 4 days, but this may vary depending on the 
turnaround time of the synthetic laboratory. 
Coupling a synthetic peptide to a carrier protein 
takes from 1 to 2 days. Although not covered 
in this unit, production of the antisera will vary 
with the animal and protocol used, but gener- 
ally requires 2 to 3 months. It is therefore 
advisable to inject several animals with differ- 
ent peptides at one time. 
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